2.2. String Literals
f'...'
- Format Stringr'...'
- Raw Stringu'...'
- Unicode Stringb'...'
- Byte String
2.2.1. Format String
f-string
String interpolation (variable substitution)
Since Python 3.6
Used for
str
concatenation
name = 'Alice'
text = 'Hello {name}'
print(text)
Hello {name}
name = 'Alice'
text = f'Hello {name}'
print(text)
Hello Alice
2.2.2. Raw String
Escape characters does not matters
text = 'Hello\nAlice'
print(text)
Hello
Alice
text = r'Hello\nAlice'
print(text)
Hello\nAlice
2.2.3. Unicode Literal
In Python 3
str
is UnicodeIn Python 2
str
is BytesIn Python 3
u'...'
is only for compatibility with Python 2
text = u'hello' # hello
text = u'cześć' # cześć
Since Python 3 all strings are unicode literals and there is no need to add u-prefix anymore:
text = 'hello' # hello
text = 'cześć' # cześć
2.2.4. Bytes Literal
Used while reading from low level devices and drivers
Used in sockets and HTTP connections
bytes
is a sequence of octets (integers between 0 and 255)bytes.decode()
- convertsbytes
tostr
(using UTF-8 encoding)str.encode()
convertsstr
tobytes
(using UTF-8 encoding)
text = b'hello' # hello
text = b'cze\xc5\x9b\xc4\x87' # cześć
Convert bytes
to str
:
text = 'cześć'
text.encode()
b'cze\xc5\x9b\xc4\x87'
Convert bytes
to str
:
data = b'cze\xc5\x9b\xc4\x87'
data.decode()
'cześć'
Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data.
2.2.5. Multiple letters
name = 'Alice'
text = fr'Hello\n{name}'
print(text)
Hello\nAlice
name = 'Alice'
text = rf'Hello\n{name}'
print(text)
Hello\nAlice
2.2.6. Lowercase vs Uppercase
Python does not differentiate between those two
Works exacly the same
VS Code thinks
r'...'
is a regex andR'...'
is a raw string
text = r'Hello\nAlice'
print(text)
Hello\nAlice
text = R'Hello\nAlice'
print(text)
Hello\nAlice
text = 'hello' # unicode
text = u'hello' # unicode
text = b'hello' # bytes
text = f'hello' # f-string
text = r'hello' # raw-string
text = U'hello' # unicode
text = B'hello' # bytes
text = F'hello' # f-string
text = R'hello' # raw-string
2.2.7. Case Study
Problem with paths on Windows
Let's take a look on file paths notation POSIX compliant operating systems.
Linux paths:
print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt
macOS paths:
print('/User/mwatney/newfile.txt')
/User/mwatney/newfile.txt
And now for something completely different, Windows paths:
print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt
However, paths on Windows do not use slashes. You must use backslash as a
path separator. This is where all problems starts. Let's start changing
slashes to backslashes from the end (the one before newfile.txt
):
print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt
This is because \n
is a newline character. In order this to work
we need to escape it.
print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt
This is better, now another slash, this time the one before mwatney
:
print('C:/Users\mwatney\\newfile.txt')
SyntaxWarning: invalid escape sequence '\m'
C:/Users\mwatney\newfile.txt
Since Python 3.12 all non-existing escape characters (in this case \m
will need to be escaped or put inside of a row strings. This is only a
warning (SyntaxWarning: invalid escape sequence '\m'
, so we can ignore
it, but this behavior will be default sometime in the future, so it is better
to avoid it now:
print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt
Ok, we are getting somewhere. The last slash (the one before Users
):
print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
This time the problem is more serious. Problem is with \Users
. After
escape sequence \U
Python expects hexadecimal Unicode codepoint, i.e.
\U0001F600
which is a smiley 😀 emoticon emoticon. In this example,
Python finds letter s
, which is invalid hexadecimal character and
therefore raises an SyntaxError
telling user that there is an error
with decoding bytes. The only valid hexadecimal numbers are
0123456789abcdefABCDEF
and letter s
isn't one of them.
There is two ways how you can avoid this problem. Using escape before every slash:
print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt
Or use r-string:
print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt
Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.
2.2.8. Recap
f'...'
- f-string - Format String (variable substitution)r'...'
- r-string - Raw String (escape characters does not matters)u'...'
- u-string - Unicode String (for compatibility with Python 2)b'...'
- b-string - Byte String (low level and network communication)bytes.decode()
- convertsbytes
tostr
(using UTF-8 encoding)str.encode()
convertsstr
tobytes
(using UTF-8 encoding)
Format string:
name = 'Alice'
text = f'Hello {name}'
print(text)
Hello Alice
Raw string:
text = r'Hello\nAlice'
print(text)
Hello\nAlice
Unicode string:
text = u'hello' # hello
text = u'cześć' # cześć
Bytes string:
text = b'hello' # hello
text = b'cze\xc5\x9b\xc4\x87' # cześć
Conversion:
text = 'cześć'
text.encode()
b'cze\xc5\x9b\xc4\x87'
data = b'cze\xc5\x9b\xc4\x87'
data.decode()
'cześć'
2.2.9. Use Case - 1
Raw-string in Regular Expressions:
'\\b[a-z]+\\b'
'\\b[a-z]+\\b'
r'\b[a-z]+\b'
'\\b[a-z]+\\b'
2.2.10. Assignments
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Define `result` with value `Hello X`
# 2. In place of X, insert value of the `NAME` variable
# 3. Use f-string
# 4. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello X`
# 2. W miejsce X, wstaw wartość zmiennej `NAME`
# 3. Użyj f-string
# 4. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `f'...'`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'Hello Alice'
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% Imports
# %% Types
result: str
# %% Data
NAME = 'Alice'
# %% Result
result = ...
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Define `result` with value `Hello\nWorld`
# 2. Use r-string
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello\nWorld`
# 2. Użyj r-string
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `r'...'`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'Hello\\\\nWorld'
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% Imports
# %% Types
result: str
# %% Data
# %% Result
result = ...
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 2
# - Minutes: 2
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Define `result_a` with value `hello`, use unicode string
# 2. Define `result_b` with value `hello`, use bytes string
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result_a` z wartością `hello`, użyj ciągu znaków unicode
# 2. Zdefiniuj `result_b` z wartością `hello`, użyj ciągu bajtów
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `u'...'`
# - `b'...'`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result_a is not Ellipsis, \
'Assign your result to variable `result_a`'
>>> assert type(result_a) is str, \
'Variable `result_a` has invalid type, should be str'
>>> result_a
'hello'
>>> assert result_b is not Ellipsis, \
'Assign your result to variable `result_b`'
>>> assert type(result_b) is bytes, \
'Variable `result_b` has invalid type, should be bytes'
>>> result_b
b'hello'
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% Imports
# %% Types
result_a: str
result_b: bytes
# %% Data
# %% Result
result_a = ...
result_b = ...
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Define `result` with unicode string `DATA` encoded to bytes
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` ze stringiem unicode `DATA` zakodowanym do bajtów
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `str.encode(encoding)`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is bytes, \
'Variable `result` has invalid type, should be bytes'
>>> result
b'cze\xc5\x9b\xc4\x87'
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% Imports
# %% Types
result: bytes
# %% Data
DATA = 'cześć'
# %% Result
result = ...
# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Define `result` with bytes `DATA` decoded to unicode string
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed
# %% Polish
# 1. Zdefiniuj `result` z bajtami `DATA` zdekodowanym do ciągu znaków unicode
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `str.decode(encoding)`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'
>>> result
'cześć'
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% Imports
# %% Types
result: str
# %% Data
DATA = b'cze\xc5\x9b\xc4\x87'
# %% Result
result = ...