5.3. Case Study: Unique Keys
5.3.1. SetUp
Setup code used for all examples:
>>> DATA = [
... {'firstname': 'Alice', 'lastname': 'Apricot'},
... {'firstname': 'Bob', 'age': 31},
... {'lastname': 'Corn', 'firstname': 'Carol'},
... {'lastname': 'Durian', 'age': 33},
... {'age': 34, 'firstname': 'Eve'},
... {'age': 15, 'lastname': 'Mallory', },
... ]
5.3.2. List Append If
Append if object not in the list:
>>> # %%timeit -r 1000 -n 1000
>>> result = []
>>> for row in DATA:
... for key in row.keys():
... if key not in result:
... result.append(key)
2.16 µs ± 26.5 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.3. List Append
Append to list and deduplicate at the end:
>>> # %%timeit -r 1000 -n 1000
>>> result = []
>>> for row in DATA:
... for key in row.keys():
... result.append(key)
>>> result = set(result)
2.5 µs ± 32.9 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.4. Set Add
>>> ## %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... for key in row.keys():
... result.add(key)
2.12 µs ± 32.4 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.5. Set Update
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... result.update(row.keys())
1.57 µs ± 26.7 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.6. Set Comprehension
>>> # %%timeit -r 1000 -n 1000
>>> result = set(key
... for record in DATA
... for key in record.keys())
2.06 µs ± 79.7 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.7. Set Comprehension Add
Add to Set Comprehension.
Code appends generator object not values, this is why it is so fast!:
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> result.add(key
... for record in DATA
... for key in record.keys())
447 ns ± 9.52 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.8. Set Comprehension Update
Update Set Comprehension:
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> result.update(tuple(x.keys()) for x in DATA)
2.06 µs ± 45.9 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.9. Set Comprehension Update
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... result.update(row)
5.3.10. Set Comprehension Update Tuple
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... result.update(tuple(row))
2.09 µs ± 16.1 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.11. Set Comprehension Update List
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... result.update(list(row))
2.33 µs ± 30.2 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.12. Set Comprehension Update Set
>>> # %%timeit -r 1000 -n 1000
>>> result = set()
>>> for row in DATA:
... result.update(set(row))
1.71 µs ± 54 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)
5.3.13. Assignments
# %% About
# - Name: CaseStudy UniqueKeys All
# - Difficulty: easy
# - Lines: 3
# - Minutes: 5
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. Collect unique keys from all rows in one sequence `result`
# 2. Run doctests - all must succeed
# %% Polish
# 1. Zbierz unikalne klucze z wszystkich wierszy w jednej sekwencji `result`
# 2. Uruchom doctesty - wszystkie muszą się powieść
# %% Example
# >>> result
# ['age', 'firstname', 'lastname']
# %% Hints
# - `row.keys()`
# - Compare solutions with `Micro-benchmarking`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> result is not Ellipsis
True
>>> type(result) in (set, list, tuple, frozenset)
True
>>> sorted(result)
['age', 'firstname', 'lastname']
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -f -v myfile.py`
# %% Imports
# %% Types
result: set[str]
# %% Data
DATA = [
{'firstname': 'Alice', 'lastname': 'Apricot'},
{'firstname': 'Bob', 'age': 31},
{'lastname': 'Corn', 'firstname': 'Carol'},
{'lastname': 'Durian', 'age': 33},
{'age': 34, 'firstname': 'Eve'},
{'age': 15, 'lastname': 'Mallory', },
]
# %% Result
result = ...