11.5. Loop For Dict

  • Since Python 3.7: dict keeps order

  • Before Python 3.7: dict order is not ensured!!

11.5.1. Iterate

  • By default dict iterates over keys

  • Suggested variable name: key

>>> DATA = {
...     'sepal_length': 5.1,
...     'sepal_width': 3.5,
...     'petal_length': 1.4,
...     'petal_width': 0.2,
...     'species': 'setosa',
... }
>>>
>>> for obj in DATA:
...     print(obj)
sepal_length
sepal_width
petal_length
petal_width
species

11.5.2. Iterate Keys

  • Suggested variable name: key

>>> DATA = {
...     'sepal_length': 5.1,
...     'sepal_width': 3.5,
...     'petal_length': 1.4,
...     'petal_width': 0.2,
...     'species': 'setosa',
... }
>>>
>>> list(DATA.keys())
['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
>>>
>>> for obj in DATA.keys():
...     print(obj)
sepal_length
sepal_width
petal_length
petal_width
species

11.5.3. Iterate Values

  • Suggested variable name: value

>>> DATA = {
...     'sepal_length': 5.1,
...     'sepal_width': 3.5,
...     'petal_length': 1.4,
...     'petal_width': 0.2,
...     'species': 'setosa',
... }
>>>
>>> list(DATA.values())
[5.1, 3.5, 1.4, 0.2, 'setosa']
>>>
>>> for obj in DATA.values():
...     print(obj)
5.1
3.5
1.4
0.2
setosa

11.5.4. Iterate Key-Value Pairs

  • Suggested variable name: key, value

Getting pair: key, value from dict items:

>>> DATA = {
...     'sepal_length': 5.1,
...     'sepal_width': 3.5,
...     'petal_length': 1.4,
...     'petal_width': 0.2,
...     'species': 'setosa',
... }
>>>
>>>
>>> list(DATA.items())  
[('sepal_length', 5.1),
 ('sepal_width', 3.5),
 ('petal_length', 1.4),
 ('petal_width', 0.2),
 ('species', 'setosa')]
>>>
>>> for key, value in DATA.items():
...     print(key, '->', value)
sepal_length -> 5.1
sepal_width -> 3.5
petal_length -> 1.4
petal_width -> 0.2
species -> setosa

11.5.5. List of Dicts

Unpacking list of dict:

>>> DATA = [
...     {'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2, 'species': 'setosa'},
...     {'sepal_length': 5.7, 'sepal_width': 2.8, 'petal_length': 4.1, 'petal_width': 1.3, 'species': 'versicolor'},
...     {'sepal_length': 6.3, 'sepal_width': 2.9, 'petal_length': 5.6, 'petal_width': 1.8, 'species': 'virginica'},
... ]
>>>
>>> for row in DATA:
...     sepal_length = row['sepal_length']
...     species = row['species']
...     print(f'{species} -> {sepal_length}')
setosa -> 5.1
versicolor -> 5.7
virginica -> 6.3

11.5.6. Generate with Range

  • range()

  • Don't use len(range(...)) - it evaluates generator

Create dict from two list:

>>> header = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for i in range(len(header)):
...     key = header[i]
...     value = data[i]
...     result[key] = value
>>>
>>> print(result)  
{'sepal_length': 5.1,
 'sepal_width': 3.5,
 'petal_length': 1.4,
 'petal_width': 0.2,
 'species': 'setosa'}

11.5.7. Assignments

"""
* Assignment: Loop Dict Reverse
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Use iteration to reverse dict:
       that is: change keys for values and values for keys
    2. Run doctests - all must succeed

Polish:
    1. Użyj iterowania do odwócenia dicta:
       to jest: zamień klucze z wartościami i wartości z kluczami
    2. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `dict.items()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'

    >>> assert all(type(x) is str for x in result.keys())
    >>> assert all(type(x) is int for x in result.values())
    >>> assert len(result.keys()) == 3

    >>> assert 'virginica' in result.keys()
    >>> assert 'setosa' in result.keys()
    >>> assert 'versicolor' in result.keys()

    >>> assert 0 in result.values()
    >>> assert 1 in result.values()
    >>> assert 2 in result.values()

    >>> result
    {'virginica': 0, 'setosa': 1, 'versicolor': 2}
"""

DATA = {
    0: 'virginica',
    1: 'setosa',
    2: 'versicolor',
}

# dict[str,int]:
result = ...

"""
* Assignment: Loop Dict To Dict
* Type: class assignment
* Complexity: medium
* Lines of code: 3 lines
* Time: 8 min

English:
    1. Convert to `result: dict[str, int]`
    2. Run doctests - all must succeed

Polish:
    1. Przekonwertuj do `result: dict[str, int]`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is dict, \
    'Variable `result` has invalid type, should be dict'

    >>> pprint(result, sort_dicts=False)
    {'Doctorate': 6,
     'Prof-school': 6,
     'Masters': 5,
     'Bachelor': 5,
     'Engineer': 5,
     'HS-grad': 4,
     'Junior High': 3,
     'Primary School': 2,
     'Kindergarten': 1}
"""

DATA = {
    6: ['Doctorate', 'Prof-school'],
    5: ['Masters', 'Bachelor', 'Engineer'],
    4: ['HS-grad'],
    3: ['Junior High'],
    2: ['Primary School'],
    1: ['Kindergarten'],
}

# Converted DATA. Note values are str not int!
# type: dict[str,int]
result = ...

"""
* Assignment: Loop Dict Endswith
* Complexity: easy
* Lines of code: 5 lines
* Time: 5 min

English:
    1. Define `result: list[str]`
    2. Collect in `result` all email addresses from `DATA` -> `crew`
       with domain names mentioned in `DOMAINS`
    3. Run doctests - all must succeed

Polish:
    1. Zdefiniuj `result: list[str]`
    2. Zbierz w `result` wszystkie adresy email z `DATA` -> `crew`
       z nazwami domenowymi wymienionymi w `DOMAINS`
    3. Uruchom doctesty - wszystkie muszą się powieść

Why:
    * Check if you can filter data
    * Check if you know string methods
    * Check if you know how to iterate over list[dict]

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is list, \
    'Result must be a list'
    >>> assert len(result) > 0, \
    'Result cannot be empty'
    >>> assert all(type(element) is str for element in result), \
    'All elements in result must be a str'

    >>> result = sorted(result)
    >>> pprint(result)
    ['avogel@esa.int',
     'bjohanssen@nasa.gov',
     'cbeck@nasa.gov',
     'mlewis@nasa.gov',
     'mwatney@nasa.gov',
     'rmartinez@nasa.gov']
"""

DATA = {
    'group': 'staff',
    'created': '2000-01-02',
    'modified': '2000-01-05',
    'users': [
        {'name': 'Mark Watney', 'email': 'mwatney@nasa.gov'},
        {'name': 'Melissa Lewis', 'email': 'mlewis@nasa.gov'},
        {'name': 'Rick Martinez', 'email': 'rmartinez@nasa.gov'},
        {'name': 'Pan Twardowski', 'email': 'ptwardowski@polsa.gov.pl'},
        {'name': 'Alex Vogel', 'email': 'avogel@esa.int'},
        {'name': 'Chris Beck', 'email': 'cbeck@nasa.gov'},
        {'name': 'Ivan Ivanovich', 'email': 'iivanovich@roscosmos.ru'},
        {'name': 'Beth Johanssen', 'email': 'bjohanssen@nasa.gov'},
]}

DOMAINS = ('esa.int', 'nasa.gov')

# Email addresses with top-level domain in DOMAINS
# type: list[str]
result = ...


"""
* Assignment: About EntryTest ToListTuple
* Complexity: medium
* Lines of code: 5 lines
* Time: 5 min

English:
    1. Load `DATA` from JSON format
    2. Convert data to `result: list[tuple]`
    3. Add header as a first line
    4. Run doctests - all must succeed

Polish:
    1. Wczytaj `DATA` z formatu JSON
    2. Przekonwertuj dane do `result: list[tuple]`
    3. Dodaj nagłówek jako pierwszą linię
    4. Uruchom doctesty - wszystkie muszą się powieść

Why:
    * Convert data from list[dict] to list[tuple]
    * list[dict] is used to represent JSON data
    * list[tuple] is used to represent CSV data
    * list[tuple] is used to represent database rows
    * JSON is the most popular format in web development

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> result = list(result)
    >>> assert type(result) is list, \
    'Variable `result` has invalid type, should be list'
    >>> assert len(result) > 0, \
    'Variable `result` should not be empty'
    >>> assert all(type(row) is tuple for row in result), \
    'Variable `result` should be a list[tuple]'

    >>> pprint(result)
    [('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
     (5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa')]
"""

DATA = [
    {'sepal_length': 5.8, 'sepal_width': 2.7, 'petal_length': 5.1, 'petal_width': 1.9, 'species': 'virginica'},
    {'sepal_length': 5.1, 'sepal_width': 3.5, 'petal_length': 1.4, 'petal_width': 0.2, 'species': 'setosa'},
    {'sepal_length': 5.7, 'sepal_width': 2.8, 'petal_length': 4.1, 'petal_width': 1.3, 'species': 'versicolor'},
    {'sepal_length': 6.3, 'sepal_width': 2.9, 'petal_length': 5.6, 'petal_width': 1.8, 'species': 'virginica'},
    {'sepal_length': 6.4, 'sepal_width': 3.2, 'petal_length': 4.5, 'petal_width': 1.5, 'species': 'versicolor'},
    {'sepal_length': 4.7, 'sepal_width': 3.2, 'petal_length': 1.3, 'petal_width': 0.2, 'species': 'setosa'},
]


# Convert DATA from list[dict] to list[tuple]
# type: header = tuple[str,...]
# type: row = tuple[float,float,float,float,str]
# type: list[tuple[header|row,...]]
result = ...

"""
* Assignment: Loop Dict UniqueKeys
* Type: class assignment
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min

English:
    1. Define `result: set` with unique keys from `DATA`
    2. Run doctests - all must succeed

Polish:
    1. Zdefiniuj `result: set` z unikalnymi kluczami z `DATA`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert result is not Ellipsis, \
    'Assign your result to variable `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'

    >>> assert all(type(x) is str for x in result)

    >>> result = sorted(result)
    >>> pprint(result, width=120, sort_dicts=False)
    ['petal_length', 'petal_width', 'sepal_length', 'sepal_width', 'species']
"""

DATA = [
    {'sepal_length': 5.1, 'sepal_width': 3.5, 'species': 'setosa'},
    {'petal_length': 4.1, 'petal_width': 1.3, 'species': 'versicolor'},
    {'sepal_length': 6.3, 'petal_width': 1.8, 'species': 'virginica'},
    {'sepal_length': 5.0, 'petal_width': 0.2, 'species': 'setosa'},
    {'sepal_width': 2.8, 'petal_length': 4.1, 'species': 'versicolor'},
    {'sepal_width': 2.9, 'petal_width': 1.8, 'species': 'virginica'},
]

# Define variable `result` with converted DATA
# type: set
result = ...

"""
* Assignment: Loop Dict LabelEncoder
* Type: homework
* Complexity: hard
* Lines of code: 14 lines
* Time: 13 min

English:
    1. Use `DATA: list[tuple]`
    2. Define `features: list` - list of values (data from columns 0-4)
    3. Define `labels: list` - species names encoded as integers (column 4)
    4. To encode and decode species generate from `DATA` two dictionaries:
        a. `decoder: dict` - eg. {0: 'virginica', 1: 'setosa', 2: 'versicolor'}
        b. `encoder: dict` - eg. {'virginica': 0, 'setosa': 1, 'versicolor': 2}
    5. Run doctests - all must succeed

Polish:
    1. Użyj `DATA: list[tuple]`
    2. Zdefiniuj `features: list` - lista wartości (dane z kolumn 0-4)
    3. Zdefiniuj `labels: list` - nazwy gatunków zakodowane jako liczby (kolumna 4)
    4. Aby móc zakodować i odkodować gatunki wygeneruj z `DATA` dwa słowniki:
        a. `decoder: dict` - np. {0: 'virginica', 1: 'setosa', 2: 'versicolor'}
        b. `encoder: dict` - np. {'virginica': 0, 'setosa': 1, 'versicolor': 2}
    5. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from pprint import pprint

    >>> assert type(features) is list
    >>> assert type(labels) is list
    >>> assert all(type(x) is tuple for x in features)
    >>> assert all(type(x) is int for x in labels)

    >>> pprint(features)
    [(5.8, 2.7, 5.1, 1.9),
     (5.1, 3.5, 1.4, 0.2),
     (5.7, 2.8, 4.1, 1.3),
     (6.3, 2.9, 5.6, 1.8),
     (6.4, 3.2, 4.5, 1.5),
     (4.7, 3.2, 1.3, 0.2)]

    >>> pprint(labels)
    [0, 1, 2, 0, 2, 1]

    >>> decoder
    {0: 'virginica', 1: 'setosa', 2: 'versicolor'}

    >>> encoder
    {'virginica': 0, 'setosa': 1, 'versicolor': 2}
"""

DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
]

# Define `features: list` - list of values (data from columns 0-4)
# type: list[tuple]
features = ...

# Define `labels: list` - species names encoded as integers (column 4)
# type: list[int]
labels = ...

# Generate `decoder: dict` - eg. {0: 'virginica', 1: 'setosa', 2: 'versicolor'}
# type: dict
decoder = ...

# Generate `encoder: dict` - eg. {'virginica': 0, 'setosa': 1, 'versicolor': 2}
# type: dict
encoder = ...