# 12.2. Comprehension List¶

## 12.2.1. Syntax¶

Short syntax:

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]


Long Syntax:

>>> list(x for x in range(0,5))
[0, 1, 2, 3, 4]


## 12.2.2. Microbenchmark¶

>>>
... %%timeit -r 1000 -n 1000
... result = []
... for x in range(0,5):
...     result.append(x)
...
457 ns ± 69.4 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)

>>>
... %%timeit -r 1000 -n 1000
... result = [x for x in range(0,5)]
...
411 ns ± 76.6 ns per loop (mean ± std. dev. of 1000 runs, 1000 loops each)


## 12.2.3. Manipulate Numbers¶

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]
>>>
>>> [x+10 for x in range(0,5)]
[10, 11, 12, 13, 14]

>>> [x*x for x in range(1,5)]
[1, 4, 9, 16]
>>>
>>> [x*(x+1) for x in range(1,5)]
[2, 6, 12, 20]

>>> [x**2 for x in range(0,5)]
[0, 1, 4, 9, 16]
>>>
>>> [x**3 for x in range(0,5)]
[0, 1, 8, 27, 64]
>>>
>>> [2**x for x in range(0,5)]
[1, 2, 4, 8, 16]
>>>
>>> [3**x for x in range(0,5)]
[1, 3, 9, 27, 81]

>>> [1/x for x in range(0,5)]
Traceback (most recent call last):
ZeroDivisionError: division by zero
>>>
>>> [1/x for x in range(1,5)]
[1.0, 0.5, 0.3333333333333333, 0.25]


## 12.2.4. Manipulate Strings¶

>>> DATA = ['a', 'b', 'c']
>>>
>>> ','.join(DATA)
'a,b,c'
>>>
>>> ','.join(x for x in DATA)
'a,b,c'
>>>
>>> ','.join(x.upper() for x in DATA)
'A,B,C'


## 12.2.5. Type Conversion¶

>>> DATA = [1, 2, 3]
>>>
>>> [float(x) for x in DATA]
[1.0, 2.0, 3.0]


Method str.join() requires all arguments to be strings. If your data has other types in it, such as int in the following examples, method will fail. You can convert those values to string using comprehension.

>>> DATA = [1, 2, 3]
>>>
>>> ','.join(DATA)
Traceback (most recent call last):
TypeError: sequence item 0: expected str instance, int found
>>>
>>> ','.join(str(x) for x in DATA)
'1,2,3'


## 12.2.6. Slice Sequences¶

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row for row in DATA]
[('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> [row for row in DATA[1:]]
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa')]


## 12.2.7. Slice Data in Sequences¶

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row[-1] for row in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']
>>>
>>> [row[0:4] for row in DATA[1:]]
[(5.8, 2.7, 5.1, 1.9),
(5.1, 3.5, 1.4, 0.2),
(5.7, 2.8, 4.1, 1.3),
(6.3, 2.9, 5.6, 1.8),
(6.4, 3.2, 4.5, 1.5),
(4.7, 3.2, 1.3, 0.2)]


## 12.2.8. Unpack Sequences¶

>>> DATA = [
...     ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
...     (5.8, 2.7, 5.1, 1.9, 'virginica'),
...     (5.1, 3.5, 1.4, 0.2, 'setosa'),
...     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...     (6.3, 2.9, 5.6, 1.8, 'virginica'),
...     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...     (4.7, 3.2, 1.3, 0.2, 'setosa'),
... ]
>>>
>>>
>>> [row[0:4] for row in DATA[1:]]
[(5.8, 2.7, 5.1, 1.9),
(5.1, 3.5, 1.4, 0.2),
(5.7, 2.8, 4.1, 1.3),
(6.3, 2.9, 5.6, 1.8),
(6.4, 3.2, 4.5, 1.5),
(4.7, 3.2, 1.3, 0.2)]

>>> [row[-1] for row in DATA[1:]]
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']


## 12.2.9. Use Case - 0x01¶

• Increment

>>> [x+1 for x in range(0,5)]
[1, 2, 3, 4, 5]


## 12.2.10. Use Case - 0x02¶

• Decrement

>>> [x-1 for x in range(0,5)]
[-1, 0, 1, 2, 3]


## 12.2.11. Use Case - 0x03¶

• Sum

>>> sum(x for x in range(0,5))
10


## 12.2.12. Use Case - 0x04¶

• Even or Odd

>>> [x for x in range(0,5)]
[0, 1, 2, 3, 4]

>>> [x%2==0 for x in range(0,5)]
[True, False, True, False, True]


## 12.2.13. Assignments¶

"""
* Assignment: Comprehension List Translate
* Type: class assignment
* Complexity: easy
* Lines of code: 1 lines
* Time: 3 min

English:
1. Use list comprehension to iterate over DATA
2. If letter is in PL then use conversion value as letter
3. Add letter to result
4. Run doctests - all must succeed

Polish:
1. Użyj rozwinięcia listowego do iteracji po DATA
2. Jeżeli litera jest w PL to użyj skonwertowanej wartości jako litera
3. Dodaj literę do result
4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
* str.join()
* dict.get()

Tests:
>>> import sys; sys.tracebacklimit = 0

>>> assert type(result) is str

>>> result
'zazolc gesla jazn'
"""

PL = {
'ą': 'a',
'ć': 'c',
'ę': 'e',
'ł': 'l',
'ń': 'n',
'ó': 'o',
'ś': 's',
'ż': 'z',
'ź': 'z',
}

DATA = 'zażółć gęślą jaźń'

# DATA with substituted PL diacritic chars to ASCII letters
# type: str
result = ...


"""
* Assignment: Comprehension List Split
* Type: homework
* Complexity: medium
* Lines of code: 4 lines
* Time: 8 min

English:
1. Using List Comprehension split DATA into:
a. features_train: list[tuple] - 60% of first features in DATA
b. features_test: list[tuple] - 40% of last features in DATA
c. labels_train: list[str] - 60% of first labels in DATA
d. labels_test: list[str] - 40% of last labels in DATA
2. In order to do so, calculate pivot point:
a. length of DATA times given percent (60% = 0.6)
b. remember, that slice indicies must be int, not float
c. for example: if dataset has 10 rows, then 6 rows will be for
training, and 4 rows for test
3. Run doctests - all must succeed

Polish:
1. Używając List Comprehension podziel DATA na:
a. features_train: list[tuple] - 60% pierwszych features w DATA
b. features_test: list[tuple] - 40% ostatnich features w DATA
c. labels_train: list[str] - 60% pierwszych labels w DATA
d. labels_test: list[str] - 40% ostatnich labels w DATA
2. Aby to zrobić, wylicz punkt podziału:
a. długość DATA razy zadany procent (60% = 0.6)
b. pamiętaj, że indeksy slice muszą być int a nie float
c. na przykład: if zbiór danych ma 10 wierszy, to 6 wierszy będzie
do treningu, a 4 do testów
3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
* iterable[:split]
* iterable[split:]

Tests:
>>> import sys; sys.tracebacklimit = 0
>>> from pprint import pprint

>>> assert type(features_train) is list, \
'make sure features_train is a list'

>>> assert type(features_test) is list, \
'make sure features_test is a list'

>>> assert type(labels_train) is list, \
'make sure labels_train is a list'

>>> assert type(labels_test) is list, \
'make sure labels_test is a list'

>>> assert all(type(x) is tuple for x in features_train), \
'all elements in features_train should be tuple'

>>> assert all(type(x) is tuple for x in features_test), \
'all elements in features_test should be tuple'

>>> assert all(type(x) is str for x in labels_train), \
'all elements in labels_train should be str'

>>> assert all(type(x) is str for x in labels_test), \
'all elements in labels_test should be str'

>>> pprint(features_train)
[(5.8, 2.7, 5.1, 1.9),
(5.1, 3.5, 1.4, 0.2),
(5.7, 2.8, 4.1, 1.3),
(6.3, 2.9, 5.6, 1.8),
(6.4, 3.2, 4.5, 1.5),
(4.7, 3.2, 1.3, 0.2)]

>>> pprint(features_test)
[(7.0, 3.2, 4.7, 1.4),
(7.6, 3.0, 6.6, 2.1),
(4.9, 3.0, 1.4, 0.2),
(4.9, 2.5, 4.5, 1.7)]

>>> pprint(labels_train)
['virginica', 'setosa', 'versicolor', 'virginica', 'versicolor', 'setosa']

>>> pprint(labels_test)
['versicolor', 'virginica', 'setosa', 'virginica']
"""

DATA = [
('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
(5.8, 2.7, 5.1, 1.9, 'virginica'),
(5.1, 3.5, 1.4, 0.2, 'setosa'),
(5.7, 2.8, 4.1, 1.3, 'versicolor'),
(6.3, 2.9, 5.6, 1.8, 'virginica'),
(6.4, 3.2, 4.5, 1.5, 'versicolor'),
(4.7, 3.2, 1.3, 0.2, 'setosa'),
(7.0, 3.2, 4.7, 1.4, 'versicolor'),
(7.6, 3.0, 6.6, 2.1, 'virginica'),
(4.9, 3.0, 1.4, 0.2, 'setosa'),
(4.9, 2.5, 4.5, 1.7, 'virginica'),
]

ratio = 0.6