7.1. String Escape Characters
\r\n
- is used on windows\n
- is used everywhere else
Sequence |
Description |
---|---|
|
New line (LF - Linefeed) |
|
Carriage Return (CR) |
|
Horizontal Tab (TAB) |
|
Single quote |
|
Double quote |
|
Backslash |
Sequence |
Description |
---|---|
|
Bell (BEL) |
|
Backspace (BS) |
|
New page (FF - Form Feed) |
|
Vertical Tab (VT) |
|
Character with 16-bit (2 bytes) hex value |
|
Character with 32-bit (4 bytes) hex value |
|
ASCII character with octal value |
|
ASCII character with hex value |
print('\U0001F680') # 🚀
7.1.1. Case Study
Problem with paths on Windows
Let's take a look on file paths notation POSIX compliant operating systems.
Linux paths:
>>> print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt
macOS paths:
>>> print('/User/mwatney/newfile.txt') # macOS
/User/mwatney/newfile.txt
And now for something completely different, Windows paths:
>>> print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt
However, paths on Windows do not use slashes. You must use backslash as a
path separator. This is where all problems starts. Let's start changing
slashes to backslashes from the end (the one before newfile.txt
):
>>> print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt
This is because \n
is a newline character. In order this to work
we need to escape it.
>>> print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt
This is better, now another slash, this time the one before mwatney
:
>>> print('C:/Users\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt
Since Python 3.12 all not existing escape characters (in this case \m
will need to be escaped or put inside of a row strings. This is only a
warning (SyntaxWarning: invalid escape sequence '\m'
), so we can
ignore it, but this behavior will be default sometime in the future,
so it is better to avoid it now:
>>> print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt
Ok, we are getting somewhere. The last slash (the one before Users
):
>>> print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
This time the problem is more serious. Problem is with \Users
. After
escape sequence \U
Python expects hexadecimal Unicode codepoint, i.e.
\U0001F600
which is a smiley 😀 emoticon emoticon. In this example,
Python finds letter s
, which is invalid hexadecimal character and
therefore raises an SyntaxError
telling user that there is an error
with decoding bytes. The only valid hexadecimal numbers are
0123456789abcdefABCDEF
and letter s
isn't one of them.
There is two ways how you can avoid this problem. Using escape before every slash:
>>> print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt
Or use r-string:
>>> print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt
Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.