7.1. String Escape Characters

../../_images/type-machine1.jpg

Figure 7.15. Why we have '\r\n' on Windows?

Table 7.7. Frequently used escape characters

Sequence

Description

\n

New line (LF - Linefeed)

\r

Carriage Return (CR)

\t

Horizontal Tab (TAB)

\'

Single quote '

\"

Double quote "

\\

Backslash \

Table 7.8. Less frequently used escape characters

Sequence

Description

\a

Bell (BEL)

\b

Backspace (BS)

\f

New page (FF - Form Feed)

\v

Vertical Tab (VT)

\uF680

Character with 16-bit (2 bytes) hex value F680

\U0001F680

Character with 32-bit (4 bytes) hex value 0001F680

\o755

ASCII character with octal value 755

\x1F680

ASCII character with hex value 1F680

print('\U0001F680')     # 🚀

7.1.1. Case Study

  • Problem with paths on Windows

Let's take a look on file paths notation POSIX compliant operating systems.

Linux paths:

>>> print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt

macOS paths:

>>> print('/User/mwatney/newfile.txt')  # macOS
/User/mwatney/newfile.txt

And now for something completely different, Windows paths:

>>> print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt

However, paths on Windows do not use slashes. You must use backslash as a path separator. This is where all problems starts. Let's start changing slashes to backslashes from the end (the one before newfile.txt):

>>> print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt

This is because \n is a newline character. In order this to work we need to escape it.

>>> print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt

This is better, now another slash, this time the one before mwatney:

>>> print('C:/Users\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt

Since Python 3.12 all not existing escape characters (in this case \m will need to be escaped or put inside of a row strings. This is only a warning (SyntaxWarning: invalid escape sequence '\m'), so we can ignore it, but this behavior will be default sometime in the future, so it is better to avoid it now:

>>> print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt

Ok, we are getting somewhere. The last slash (the one before Users):

>>> print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

This time the problem is more serious. Problem is with \Users. After escape sequence \U Python expects hexadecimal Unicode codepoint, i.e. \U0001F600 which is a smiley 😀 emoticon emoticon. In this example, Python finds letter s, which is invalid hexadecimal character and therefore raises an SyntaxError telling user that there is an error with decoding bytes. The only valid hexadecimal numbers are 0123456789abcdefABCDEF and letter s isn't one of them.

There is two ways how you can avoid this problem. Using escape before every slash:

>>> print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt

Or use r-string:

>>> print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt

Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.