7.1. Mapping Define

  • dict are key-value storage (HashMap)

  • Mutable - can add, remove, and modify items

  • Since Python 3.7: dict keeps order of elements

  • Before Python 3.7: dict order is not ensured!!

How are dictionaries implemented in CPython? [1]

CPython's dictionaries are implemented as resizable hash tables. Compared to B-trees, this gives better performance for lookup (the most common operation by far) under most circumstances, and the implementation is simpler.

Dictionaries work by computing a hash code for each key stored in the dictionary using the hash() built-in function. The hash code varies widely depending on the key and a per-process seed; for example, "Python" could hash to -539294296 while "python", a string that differs by a single bit, could hash to 1142331976. The hash code is then used to calculate a location in an internal array where the value will be stored. Assuming that you're storing keys that all have different hash values, this means that dictionaries take constant time – O(1), in Big-O notation – to retrieve a key.

../../_images/type-dict-hashmap.png

7.1.1. Syntax

  • {} is used more often

  • dict() is more readable

  • Comma after last element is optional

>>> data = {}
>>> data = dict()
>>> data = {
...     'commander': 'Melissa Lewis',
...     'botanist': 'Mark Watney',
...     'pilot': 'Rick Martinez',
... }
>>> data = dict(
...     commander='Melissa Lewis',
...     botanist='Mark Watney',
...     pilot='Rick Martinez',
... )

7.1.2. Duplicates

Duplicating items are overridden by latter:

>>> data = {
...     'commander': 'Mark Watney',
...     'commander': 'Melissa Lewis',
... }
>>>
>>> data
{'commander': 'Melissa Lewis'}

7.1.3. Dict vs Set

  • Both set and dict keys must be hashable

  • Both set and dict uses the same { and } braces

  • Despite similar syntax, they are different types

>>> data = {1, 2}
>>> type(data)
<class 'set'>
>>>
>>>
>>> data = {1: 2}
>>> type(data)
<class 'dict'>
>>> data = {1, 2, 3, 4}
>>> type(data)
<class 'set'>
>>>
>>>
>>> data = {1: 2, 3: 4}
>>> type(data)
<class 'dict'>

Empty dict and empty set:

>>> data = {1: None}
>>> _ = data.pop(1)
>>>
>>> data
{}
>>> data = {1}
>>> _ = data.pop()
>>>
>>> data
set()

7.1.4. List of Pairs

Pair:

>>> data = ('commander', 'Melissa Lewis')

List of pairs:

>>> data = [
...     ('commander', 'Melissa Lewis'),
...     ('botanist', 'Mark Watney'),
...     ('pilot', 'Rick Martinez'),
... ]

Convert list of pairs to dict:

>>> data = [
...     ('commander', 'Melissa Lewis'),
...     ('botanist', 'Mark Watney'),
...     ('pilot', 'Rick Martinez')
... ]
>>>
>>> dict(data)  
{'commander': 'Melissa Lewis',
  'botanist': 'Mark Watney',
  'pilot': 'Rick Martinez'}

7.1.5. Length

>>> crew = {
...     'commander': 'Melissa Lewis',
...     'botanist': 'Mark Watney',
...     'pilot': 'Rick Martinez',
... }
>>>
>>>
>>> len(crew)
3
>>>
>>> len(crew.keys())
3
>>>
>>> len(crew.values())
3
>>>
>>> len(crew.items())
3

7.1.6. Use Case - 0x1

  • GIT - version control system

>>> git = {
...    'ce16a8ce': 'commit/1',
...    'cae6b510': 'commit/2',
...    '895444a6': 'commit/3',
...    'aef731b5': 'commit/4',
...    '4a92bc79': 'branch/master',
...    'b3bbd85a': 'tag/v1.0',
... }

7.1.7. References

7.1.8. Assignments

Code 7.1. Solution
"""
* Assignment: Mapping Define Dict
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 3 min

English:
    1. Create `result: dict` representing input data
    2. Non-functional requirements:
        a. Assignmnet verifies creation of `dict()`
        b. Do not parse data, simply model it using dict
        c. Do not use `str.split()`, `slice`, `getitem`, `for`, `while` or
           any other control-flow statement
    3. Run doctests - all must succeed

Polish:
    1. Stwórz `result: dict` reprezentujący dane wejściowe
    2. Wymagania niefunkcjonalne:
        a. Zadanie sprawdza tworzenie `dict()`
        b. Nie parsuj danych, po prostu zamodeluj je jako dict
        c. Nie używaj `str.split()`, `slice`, `getitem`, `for`, `while` lub
           jakiejkolwiek innej instrukcji sterującej
    3. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'
    >>> assert 'firstname' in result.keys(), \
    'Value `firstname` is not in the result keys'
    >>> assert 'lastname' in result.keys(), \
    'Value `lastname` is not in the result keys'
    >>> assert 'missions' in result.keys(), \
    'Value `missions` is not in the result keys'
    >>> assert 'Mark' in result['firstname'], \
    'Value `Mark` is not in the result values'
    >>> assert 'Watney' in result['lastname'], \
    'Value `Watney` is not in the result values'
    >>> assert 'Ares1' in result['missions'], \
    'Value `Ares1` is not in the result values'
    >>> assert 'Ares3' in result['missions'], \
    'Value `Ares3` is not in the result values'
"""


# firstname - Mark
# lastname - Watney
# missions - Ares1, Ares3

# Define dict with keys: firstname, lastname and missions
# type: dict[str,str|list]
result = ...

Code 7.2. Solution
"""
* Assignment: Mapping Generate Pairs
* Type: class assignment
* Complexity: easy
* Lines of code: 1 lines
* Time: 2 min

English:
    1. Define `result: dict`
    2. Convert `DATA` to `dict` and assign to `result`
    3. Run doctests - all must succeed

Polish:
    1. Zdefiniuj `result: dict`
    2. Przekonwertuj `DATA` do `dict` i przypisz do `result`
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `dict()`

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'

    >>> assert all(type(x) is str for x in result.keys()), \
    'All dict keys should be str'

    >>> assert 'sepal_length' in result.keys()
    >>> assert 'sepal_width' in result.keys()
    >>> assert 'petal_length' in result.keys()
    >>> assert 'petal_width' in result.keys()
    >>> assert 'species' in result.keys()

    >>> assert 5.8 in result.values()
    >>> assert 2.7 in result.values()
    >>> assert 5.1 in result.values()
    >>> assert 1.9 in result.values()
    >>> assert 'virginica' in result.values()

    >>> pprint(result, sort_dicts=False)
    {'sepal_length': 5.8,
     'sepal_width': 2.7,
     'petal_length': 5.1,
     'petal_width': 1.9,
     'species': 'virginica'}
"""

DATA = [
    ('sepal_length', 5.8),
    ('sepal_width', 2.7),
    ('petal_length', 5.1),
    ('petal_width', 1.9),
    ('species', 'virginica'),
]

# Dict with converted DATA
# type: dict[str,float|str]
result = ...