2.2. String Literals

Important

f'...' - Format String
r'...' - Raw String
u'...' - Unicode String
b'...' - Byte String

2.2.1. Format String

f-string
String interpolation (variable substitution)
Since Python 3.6
Used for str concatenation

✘>>> name = 'Alice'
>>> text = 'Hello {name}'
>>>
>>> print(text)
Hello {name}

✘>>> name = 'Alice'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Alice

2.2.2. Raw String

Escape characters does not matters

✘>>> text = 'Hello\nAlice'
>>> print(text)
Hello
Alice

✘>>> text = r'Hello\nAlice'
>>> print(text)
Hello\nAlice

2.2.3. Unicode Literal

In Python 3 str is Unicode
In Python 2 str is Bytes
In Python 3 u'...' is only for compatibility with Python 2

✘>>> text = u'hello'   # hello
>>> text = u'cześć'   # cześć

Since Python 3 all strings are unicode literals and there is no need to add u-prefix anymore:

✘>>> text = 'hello'   # hello
>>> text = 'cześć'   # cześć

2.2.4. Bytes Literal

Used while reading from low level devices and drivers
Used in sockets and HTTP connections
bytes is a sequence of octets (integers between 0 and 255)
bytes.decode() - converts bytes to str (using UTF-8 encoding)
str.encode() converts str to bytes (using UTF-8 encoding)

✘>>> text = b'hello'                 # hello
>>> text = b'cze\xc5\x9b\xc4\x87'   # cześć

Convert bytes to str:

✘>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'

Convert bytes to str:

✘>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'

Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data.

2.2.5. Multiple letters

✘>>> name = 'Alice'
>>> text = fr'Hello\n{name}'
>>> print(text)
Hello\nAlice

✘>>> name = 'Alice'
>>> text = rf'Hello\n{name}'
>>> print(text)
Hello\nAlice

2.2.6. Lowercase vs Uppercase

Python does not differentiate between those two
Works exacly the same
VS Code thinks r'...' is a regex and R'...' is a raw string

✘>>> text = r'Hello\nAlice'
>>> print(text)
Hello\nAlice

✘>>> text = R'Hello\nAlice'
>>> print(text)
Hello\nAlice

✘>>> text = 'hello'   # unicode
>>>
>>> text = u'hello'  # unicode
>>> text = b'hello'  # bytes
>>> text = f'hello'  # f-string
>>> text = r'hello'  # raw-string
>>>
>>> text = U'hello'  # unicode
>>> text = B'hello'  # bytes
>>> text = F'hello'  # f-string
>>> text = R'hello'  # raw-string

2.2.7. Case Study

Problem with paths on Windows

Let's take a look on file paths notation POSIX compliant operating systems.

Linux paths:

✘>>> print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt

macOS paths:

✘>>> print('/User/mwatney/newfile.txt')
/User/mwatney/newfile.txt

And now for something completely different, Windows paths:

✘>>> print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt

However, paths on Windows do not use slashes. You must use backslash as a path separator. This is where all problems starts. Let's start changing slashes to backslashes from the end (the one before newfile.txt):

✘>>> print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt

This is because \n is a newline character. In order this to work we need to escape it.

✘>>> print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt

This is better, now another slash, this time the one before mwatney:

✘>>> print('C:/Users\mwatney\\newfile.txt')
SyntaxWarning: invalid escape sequence '\m'
C:/Users\mwatney\newfile.txt

Since Python 3.12 all non-existing escape characters (in this case \m will need to be escaped or put inside of a row strings. This is only a warning (SyntaxWarning: invalid escape sequence '\m', so we can ignore it, but this behavior will be default sometime in the future, so it is better to avoid it now:

✘>>> print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt

Ok, we are getting somewhere. The last slash (the one before Users):

✘>>> print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

This time the problem is more serious. Problem is with \Users. After escape sequence \U Python expects hexadecimal Unicode codepoint, i.e. \U0001F600 which is a smiley 😀 emoticon emoticon. In this example, Python finds letter s, which is invalid hexadecimal character and therefore raises an SyntaxError telling user that there is an error with decoding bytes. The only valid hexadecimal numbers are 0123456789abcdefABCDEF and letter s isn't one of them.

There is two ways how you can avoid this problem. Using escape before every slash:

✘>>> print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt

Or use r-string:

✘>>> print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt

Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.

2.2.8. Recap

f'...' - f-string - Format String (variable substitution)
r'...' - r-string - Raw String (escape characters does not matters)
u'...' - u-string - Unicode String (for compatibility with Python 2)
b'...' - b-string - Byte String (low level and network communication)
bytes.decode() - converts bytes to str (using UTF-8 encoding)
str.encode() converts str to bytes (using UTF-8 encoding)

Format string:

✘>>> name = 'Alice'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Alice

Raw string:

✘>>> text = r'Hello\nAlice'
>>> print(text)
Hello\nAlice

Unicode string:

✘>>> text = u'hello'   # hello
>>> text = u'cześć'   # cześć

Bytes string:

✘>>> text = b'hello'                 # hello
>>> text = b'cze\xc5\x9b\xc4\x87'   # cześć

Conversion:

✘>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'

✘>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'

2.2.9. Use Case - 1

Raw-string in Regular Expressions:

✘>>> '\\b[a-z]+\\b'
'\\b[a-z]+\\b'

✘>>> r'\b[a-z]+\b'
'\\b[a-z]+\\b'

2.2.10. Assignments

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Define `result` with value `Hello X`
# 2. In place of X, insert value of the `NAME` variable
# 3. Use f-string
# 4. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello X`
# 2. W miejsce X, wstaw wartość zmiennej `NAME`
# 3. Użyj f-string
# 4. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `f'...'`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'Hello Alice'
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports

# %% Types
result: str

# %% Data
NAME = 'Alice'

# %% Result
result = ...

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Define `result` with value `Hello\nWorld`
# 2. Use r-string
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello\nWorld`
# 2. Użyj r-string
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `r'...'`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'Hello\\\\nWorld'
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports

# %% Types
result: str

# %% Data

# %% Result
result = ...

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 2
# - Minutes: 2

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Define `result_a` with value `hello`, use unicode string
# 2. Define `result_b` with value `hello`, use bytes string
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result_a` z wartością `hello`, użyj ciągu znaków unicode
# 2. Zdefiniuj `result_b` z wartością `hello`, użyj ciągu bajtów
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `u'...'`
# - `b'...'`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result_a is not Ellipsis, \
'Assign your result to variable `result_a`'
>>> assert type(result_a) is str, \
'Variable `result_a` has invalid type, should be str'
>>> result_a
'hello'

>>> assert result_b is not Ellipsis, \
'Assign your result to variable `result_b`'
>>> assert type(result_b) is bytes, \
'Variable `result_b` has invalid type, should be bytes'
>>> result_b
b'hello'
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports

# %% Types
result_a: str
result_b: bytes

# %% Data

# %% Result
result_a = ...
result_b = ...

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Define `result` with unicode string `DATA` encoded to bytes
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` ze stringiem unicode `DATA` zakodowanym do bajtów
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `str.encode(encoding)`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is bytes, \
'Variable `result` has invalid type, should be bytes'

>>> result
b'cze\xc5\x9b\xc4\x87'
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports

# %% Types
result: bytes

# %% Data
DATA = 'cześć'

# %% Result
result = ...

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Define `result` with bytes `DATA` decoded to unicode string
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z bajtami `DATA` zdekodowanym do ciągu znaków unicode
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `str.decode(encoding)`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'cześć'
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports

# %% Types
result: str

# %% Data
DATA = b'cze\xc5\x9b\xc4\x87'

# %% Result
result = ...