5.2. String Literals

  • f'...' - Format String

  • r'...' - Raw String

  • u'...' - Unicode String

  • b'...' - Byte String

5.2.1. Format String

  • f-string

  • String interpolation (variable substitution)

  • Since Python 3.6

  • Used for str concatenation

>>> name = 'Mark'
>>> text = 'Hello {name}'
>>>
>>> print(text)
Hello {name}
>>> name = 'Mark'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Mark

5.2.2. Raw String

  • Escape characters does not matters

>>> text = 'Hello\nWorld'
>>> print(text)
Hello
World
>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld

5.2.3. Unicode Literal

  • In Python 3 str is Unicode

  • In Python 2 str is Bytes

  • In Python 3 u'...' is only for compatibility with Python 2

>>> text = u'hello'   # hello
>>> text = u'cześć'   # cześć

Since Python 3 all strings are unicode literals and there is no need to add u-prefix anymore:

>>> text = 'hello'   # hello
>>> text = 'cześć'   # cześć

5.2.4. Bytes Literal

  • Used while reading from low level devices and drivers

  • Used in sockets and HTTP connections

  • bytes is a sequence of octets (integers between 0 and 255)

  • bytes.decode() - converts bytes to str (using UTF-8 encoding)

  • str.encode() converts str to bytes (using UTF-8 encoding)

>>> text = b'hello'                 # hello
>>> text = b'cze\xc5\x9b\xc4\x87'   # cześć

Convert bytes to str:

>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'

Convert bytes to str:

>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'

Unicode (UTF-8) is a default encoding. You can also specify different encodings to encode and decode data.

5.2.5. Multiple letters

>>> name = 'Mark'
>>> text = fr'Hello\n{name}'
>>> print(text)
Hello\nMark
>>> name = 'Mark'
>>> text = rf'Hello\n{name}'
>>> print(text)
Hello\nMark

5.2.6. Lowercase vs Uppercase

  • Python does not differentiate between those two

  • Works exacly the same

  • VS Code thinks r'...' is a regex and R'...' is a raw string

>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld
>>> text = R'Hello\nWorld'
>>> print(text)
Hello\nWorld
>>> text = 'hello'   # unicode
>>>
>>> text = u'hello'  # unicode
>>> text = b'hello'  # bytes
>>> text = f'hello'  # f-string
>>> text = r'hello'  # raw-string
>>>
>>> text = U'hello'  # unicode
>>> text = B'hello'  # bytes
>>> text = F'hello'  # f-string
>>> text = R'hello'  # raw-string

5.2.7. Case Study

  • Problem with paths on Windows

Let's take a look on file paths notation POSIX compliant operating systems.

Linux paths:

>>> print('/home/mwatney/newfile.txt')
/home/mwatney/newfile.txt

macOS paths:

>>> print('/User/mwatney/newfile.txt')
/User/mwatney/newfile.txt

And now for something completely different, Windows paths:

>>> print('C:/Users/mwatney/newfile.txt')
C:/Users/mwatney/newfile.txt

However, paths on Windows do not use slashes. You must use backslash as a path separator. This is where all problems starts. Let's start changing slashes to backslashes from the end (the one before newfile.txt):

>>> print('C:/Users/mwatney\newfile.txt')
C:/Users/mwatney
ewfile.txt

This is because \n is a newline character. In order this to work we need to escape it.

>>> print('C:/Users/mwatney\\newfile.txt')
C:/Users/mwatney\newfile.txt

This is better, now another slash, this time the one before mwatney:

>>> print('C:/Users\mwatney\\newfile.txt')  
SyntaxWarning: invalid escape sequence '\m'
C:/Users\mwatney\newfile.txt

Since Python 3.12 all non-existing escape characters (in this case \m will need to be escaped or put inside of a row strings. This is only a warning (SyntaxWarning: invalid escape sequence '\m', so we can ignore it, but this behavior will be default sometime in the future, so it is better to avoid it now:

>>> print('C:/Users\\mwatney\\newfile.txt')
C:/Users\mwatney\newfile.txt

Ok, we are getting somewhere. The last slash (the one before Users):

>>> print('C:\Users\\mwatney\\newfile.txt')
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

This time the problem is more serious. Problem is with \Users. After escape sequence \U Python expects hexadecimal Unicode codepoint, i.e. \U0001F600 which is a smiley 😀 emoticon emoticon. In this example, Python finds letter s, which is invalid hexadecimal character and therefore raises an SyntaxError telling user that there is an error with decoding bytes. The only valid hexadecimal numbers are 0123456789abcdefABCDEF and letter s isn't one of them.

There is two ways how you can avoid this problem. Using escape before every slash:

>>> print('C:\\Users\\mwatney\\newfile.txt')
C:\Users\mwatney\newfile.txt

Or use r-string:

>>> print(r'C:\Users\mwatney\newfile.txt')
C:\Users\mwatney\newfile.txt

Both will generate the same output, so you can choose either one. In my opinion r-strings are less error prone and I use them each time when I have to deal with paths and regular expressions.

5.2.8. Recap

  • f'...' - f-string - Format String (variable substitution)

  • r'...' - r-string - Raw String (escape characters does not matters)

  • u'...' - u-string - Unicode String (for compatibility with Python 2)

  • b'...' - b-string - Byte String (low level and network communication)

  • bytes.decode() - converts bytes to str (using UTF-8 encoding)

  • str.encode() converts str to bytes (using UTF-8 encoding)

Format string:

>>> name = 'Mark'
>>> text = f'Hello {name}'
>>>
>>> print(text)
Hello Mark

Raw string:

>>> text = r'Hello\nWorld'
>>> print(text)
Hello\nWorld

Unicode string:

>>> text = u'hello'   # hello
>>> text = u'cześć'   # cześć

Bytes string:

>>> text = b'hello'                 # hello
>>> text = b'cze\xc5\x9b\xc4\x87'   # cześć

Conversion:

>>> text = 'cześć'
>>>
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'
>>> data = b'cze\xc5\x9b\xc4\x87'
>>>
>>> data.decode()
'cześć'

5.2.9. Use Case - 1

Raw-string in Regular Expressions:

>>> '\\b[a-z]+\\b'
'\\b[a-z]+\\b'
>>> r'\b[a-z]+\b'
'\\b[a-z]+\\b'

5.2.10. Use Case - 2

There are no problems with escapes in POSIX compliant paths:

>>> path = '/home/mwatney/myfile.txt'  # Linux
>>> path = '/User/mwatney/myfile.txt'  # macOS

In Windows you can find escape character in paths. In order to avoid problems you can use slashes instead of backslashes:

>>> path = 'c:/Users/mwatney/myfile.txt'

This is not typical for this operating system, therefore hardly anyone does that. Typically users will put paths using slashes, and that's ok, if you are using escaped slashes or raw-strings:

>>> path = 'c:\\Users\\mwatney\\myfile.txt'
>>> path = r'c:\Users\mwatney\myfile.txt'

As soon as you forget about using either of them, the problem occurs:

>>> path = 'c:\Users\mwatney\myfile.txt'
Traceback (most recent call last):
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

Problem is with \Users. After escape sequence \U... Python expects hexadecimal Unicode codepoint, i.e. '\U0001F680' which is a rocket 🚀 emoticon. In this example, Python finds letter s, which is invalid hexadecimal character and therefore raises an SyntaxError telling user that there is an error with decoding bytes. The only valid hexadecimal numbers are 0123456789abcdefABCDEF and s isn't one of them.

5.2.11. Assignments

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% English
# 1. Define `result` with value `Hello X`
# 2. In place of X, insert value of the `NAME` variable
# 3. Use f-string
# 4. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello X`
# 2. W miejsce X, wstaw wartość zmiennej `NAME`
# 3. Użyj f-string
# 4. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `f'...'`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'Hello Mark'
"""

NAME = 'Mark'

# Define `result` with value `Hello X`
# In place of X, insert value of the `NAME` variable
# Use f-string
# type: str
result = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% English
# 1. Define `result` with value `Hello\nWorld`
# 2. Use r-string
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z wartością `Hello\nWorld`
# 2. Użyj r-string
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `r'...'`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'Hello\\\\nWorld'
"""

# Define `result` with value `Hello\nWorld`
# Use r-string
# type: str
result = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 2
# - Minutes: 2

# %% English
# 1. Define `result_a` with value `hello`, use unicode string
# 2. Define `result_b` with value `hello`, use bytes string
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result_a` z wartością `hello`, użyj ciągu znaków unicode
# 2. Zdefiniuj `result_b` z wartością `hello`, użyj ciągu bajtów
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `u'...'`
# - `b'...'`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result_a is not Ellipsis, \
'Assign your result to variable `result_a`'
>>> assert type(result_a) is str, \
'Variable `result_a` has invalid type, should be str'
>>> result_a
'hello'

>>> assert result_b is not Ellipsis, \
'Assign your result to variable `result_b`'
>>> assert type(result_b) is bytes, \
'Variable `result_b` has invalid type, should be bytes'
>>> result_b
b'hello'
"""

# Define `result_a` with value `hello`, use unicode string
# type: str
result_a = ...

# Define `result_b` with value `hello`, use bytes string
# type: bytes
result_b = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% English
# 1. Define `result` with unicode string `DATA` encoded to bytes
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` ze stringiem unicode `DATA` zakodowanym do bajtów
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `str.encode(encoding)`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is bytes, \
'Variable `result` has invalid type, should be bytes'

>>> result
b'cze\xc5\x9b\xc4\x87'
"""

DATA = 'cześć'

# Define `result` with unicode string `DATA` encoded to byte string
# Use UTF-8 encoding
# type: bytes
result = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Type Str Literals
# - Difficulty: easy
# - Lines: 1
# - Minutes: 2

# %% English
# 1. Define `result` with bytes `DATA` decoded to unicode string
# 2. Use UTF-8 encoding
# 3. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj `result` z bajtami `DATA` zdekodowanym do ciągu znaków unicode
# 2. Użyj kodowania UTF-8
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `str.decode(encoding)`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign your result to variable `result`'
>>> assert type(result) is str, \
'Variable `result` has invalid type, should be str'

>>> result
'cześć'
"""

DATA = b'cze\xc5\x9b\xc4\x87'

# Define `result` with bytes `DATA` decoded to unicode string
# Use UTF-8 encoding
# type: str
result = ...