5.2. String Literals
f'...'
- Format Stringr'...'
- Raw Stringu'...'
- Unicode Stringb'...'
- Byte String
5.2.1. Format String
f-string
String interpolation (variable substitution)
Since Python 3.6
Used for
str
concatenation
>>> name = 'Mark'
>>> text = 'Hello {name}'
>>>
>>> print(text)
Hello {name}
>>> name = 'Mark'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Mark
5.2.2. Raw String
Escape characters does not matters
>>> text = 'Hello\nWorld'
>>> print(text)
Hello
World
>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld
5.2.3. Unicode Literal
In Python 3
str
is UnicodeIn Python 2
str
is BytesIn Python 3
u'...'
is only for compatibility with Python 2
>>> text = u'hello' # hello
>>> text = u'cześć' # cześć
Since Python 3 all strings are unicode literals and there is no need to add u-prefix anymore:
>>> text = 'hello' # hello
>>> text = 'cześć' # cześć
5.2.4. Bytes Literal
Used while reading from low level devices and drivers
Used in sockets and HTTP connections
bytes
is a sequence of octets (integers between 0 and 255)bytes.decode()
- convertsbytes
tostr
(using UTF-8 encoding)str.encode()
convertsstr
tobytes
(using UTF-8 encoding)
>>> text = b'hello' # hello
>>> text = b'cze\xc5\x9b\xc4\x87' # cześć
Convert bytes
to str
:
>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'
Convert bytes
to str
:
>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'
Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data.
5.2.5. Multiple letters
>>> name = 'Mark'
>>> text = fr'Hello\n{name}'
>>> print(text)
Hello\nMark
>>> name = 'Mark'
>>> text = rf'Hello\n{name}'
>>> print(text)
Hello\nMark
5.2.6. Lowercase vs Uppercase
Python does not differentiate between those two
Works exacly the same
VS Code thinks
r'...'
is a regex andR'...'
is a raw string
>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld
>>> text = R'Hello\nWorld'
>>> print(text)
Hello\nWorld
>>> text = 'hello' # unicode
>>>
>>> text = u'hello' # unicode
>>> text = b'hello' # bytes
>>> text = f'hello' # f-string
>>> text = r'hello' # raw-string
>>>
>>> text = U'hello' # unicode
>>> text = B'hello' # bytes
>>> text = F'hello' # f-string
>>> text = R'hello' # raw-string
5.2.7. Case Study
Problem with paths on Windows
Let's take a look on file paths notation POSIX compliant operating systems.
Linux paths:
>>> print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt
macOS paths:
>>> print('/User/mwatney/newfile.txt')
/User/mwatney/newfile.txt
And now for something completely different, Windows paths:
>>> print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt
However, paths on Windows do not use slashes. You must use backslash as a
path separator. This is where all problems starts. Let's start changing
slashes to backslashes from the end (the one before newfile.txt
):
>>> print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt
This is because \n
is a newline character. In order this to work
we need to escape it.
>>> print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt
This is better, now another slash, this time the one before mwatney
:
>>> print('C:/Users\mwatney\\newfile.txt')
SyntaxWarning: invalid escape sequence '\m'
C:/Users\mwatney\newfile.txt
Since Python 3.12 all non-existing escape characters (in this case \m
will need to be escaped or put inside of a row strings. This is only a
warning (SyntaxWarning: invalid escape sequence '\m'
, so we can ignore
it, but this behavior will be default sometime in the future, so it is better
to avoid it now:
>>> print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt
Ok, we are getting somewhere. The last slash (the one before Users
):
>>> print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
This time the problem is more serious. Problem is with \Users
. After
escape sequence \U
Python expects hexadecimal Unicode codepoint, i.e.
\U0001F600
which is a smiley 😀 emoticon emoticon. In this example,
Python finds letter s
, which is invalid hexadecimal character and
therefore raises an SyntaxError
telling user that there is an error
with decoding bytes. The only valid hexadecimal numbers are
0123456789abcdefABCDEF
and letter s
isn't one of them.
There is two ways how you can avoid this problem. Using escape before every slash:
>>> print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt
Or use r-string:
>>> print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt
Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.
5.2.8. Recap
f'...'
- f-string - Format String (variable substitution)r'...'
- r-string - Raw String (escape characters does not matters)u'...'
- u-string - Unicode String (for compatibility with Python 2)b'...'
- b-string - Byte String (low level and network communication)bytes.decode()
- convertsbytes
tostr
(using UTF-8 encoding)str.encode()
convertsstr
tobytes
(using UTF-8 encoding)
Format string:
>>> name = 'Mark'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Mark
Raw string:
>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld
Unicode string:
>>> text = u'hello' # hello
>>> text = u'cześć' # cześć
Bytes string:
>>> text = b'hello' # hello
>>> text = b'cze\xc5\x9b\xc4\x87' # cześć
Conversion:
>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'
>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'
5.2.9. Use Case - 1
Raw-string in Regular Expressions:
>>> '\\b[a-z]+\\b'
'\\b[a-z]+\\b'
>>> r'\b[a-z]+\b'
'\\b[a-z]+\\b'
5.2.10. Use Case - 2
There are no problems with escapes in POSIX compliant paths:
>>> path = '/home/mwatney/myfile.txt' # Linux
>>> path = '/User/mwatney/myfile.txt' # macOS
In Windows you can find escape character in paths. In order to avoid problems you can use slashes instead of backslashes:
>>> path = 'c:/Users/mwatney/myfile.txt'
This is not typical for this operating system, therefore hardly anyone does that. Typically users will put paths using slashes, and that's ok, if you are using escaped slashes or raw-strings:
>>> path = 'c:\\Users\\mwatney\\myfile.txt'
>>> path = r'c:\Users\mwatney\myfile.txt'
As soon as you forget about using either of them, the problem occurs:
>>> path = 'c:\Users\mwatney\myfile.txt'
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Problem is with \Users
. After escape sequence \U...
Python expects
hexadecimal Unicode codepoint, i.e. '\U0001F680' which is a rocket 🚀
emoticon. In this example, Python finds letter s
, which is invalid
hexadecimal character and therefore raises an SyntaxError
telling user
that there is an error with decoding bytes. The only valid hexadecimal
numbers are 0123456789abcdefABCDEF
and s
isn't one of them.
5.2.11. Assignments
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% English
# 1. Define `result` with value `Hello X`
# 2. In place of X, insert value of the `NAME` variable
# 3. Use f-string
# 4. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello X`
# 2. W miejsce X, wstaw wartość zmiennej `NAME`
# 3. Użyj f-string
# 4. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `f'...'`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'Hello Mark'
"""
NAME = 'Mark'
# Define `result` with value `Hello X`
# In place of X, insert value of the `NAME` variable
# Use f-string
# type: str
result = ...
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% English
# 1. Define `result` with value `Hello\nWorld`
# 2. Use r-string
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello\nWorld`
# 2. Użyj r-string
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `r'...'`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'Hello\\\\nWorld'
"""
# Define `result` with value `Hello\nWorld`
# Use r-string
# type: str
result = ...
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 2
# - Minutes: 2
# %% English
# 1. Define `result_a` with value `hello`, use unicode string
# 2. Define `result_b` with value `hello`, use bytes string
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result_a` z wartością `hello`, użyj ciągu znaków unicode
# 2. Zdefiniuj `result_b` z wartością `hello`, użyj ciągu bajtów
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `u'...'`
# - `b'...'`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result_a is not Ellipsis, \
'Assign your result to variable `result_a`'
>>> assert type(result_a) is str, \
'Variable `result_a` has invalid type, should be str'
>>> result_a
'hello'
>>> assert result_b is not Ellipsis, \
'Assign your result to variable `result_b`'
>>> assert type(result_b) is bytes, \
'Variable `result_b` has invalid type, should be bytes'
>>> result_b
b'hello'
"""
# Define `result_a` with value `hello`, use unicode string
# type: str
result_a = ...
# Define `result_b` with value `hello`, use bytes string
# type: bytes
result_b = ...
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% English
# 1. Define `result` with unicode string `DATA` encoded to bytes
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` ze stringiem unicode `DATA` zakodowanym do bajtów
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `str.encode(encoding)`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is bytes, \
'Variable `result` has invalid type, should be bytes'
>>> result
b'cze\xc5\x9b\xc4\x87'
"""
DATA = 'cześć'
# Define `result` with unicode string `DATA` encoded to byte string
# Use UTF-8 encoding
# type: bytes
result = ...
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% English
# 1. Define `result` with bytes `DATA` decoded to unicode string
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z bajtami `DATA` zdekodowanym do ciągu znaków unicode
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `str.decode(encoding)`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'cześć'
"""
DATA = b'cze\xc5\x9b\xc4\x87'
# Define `result` with bytes `DATA` decoded to unicode string
# Use UTF-8 encoding
# type: str
result = ...