9.7. Iterator Filter

  • filter(callable, *iterables)

  • Select elements from sequence

  • Generator (lazy evaluated)

  • required callable - Function

  • required iterables - 1 or many sequence or iterator objects

>>> def even(x):
...     return x % 2 == 0
>>>
>>> result = (x for x in range(0,5) if even(x))
>>> result = filter(even, range(0,5))

9.7.1. Not-a-Generator

>>> from inspect import isgeneratorfunction, isgenerator
>>>
>>>
>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> isgeneratorfunction(filter)
False
>>>
>>> result = filter(even, [1,2,3])
>>> isgenerator(result)
False

9.7.2. Problem

Plain code:

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = []
>>>
>>> for x in DATA:
...     if even(x):
...         result.append(x)
>>>
>>> print(result)
[2, 4, 6]

Comprehension:

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = [x for x in DATA if even(x)]
>>>
>>> print(result)
[2, 4, 6]

9.7.3. Solution

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = filter(even, DATA)
>>>
>>> list(result)
[2, 4, 6]

9.7.4. Lazy Evaluation

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = filter(even, DATA)
>>>
>>> next(result)
2
>>> next(result)
4
>>> next(result)
6
>>> next(result)
Traceback (most recent call last):
StopIteration

9.7.5. Performance

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> data = [1, 2, 3, 4, 5, 6]
>>> 
... %%timeit -r 1000 -n 1000
... result = [x for x in data if even(x)]
1.11 µs ± 139 ns per loop (mean ± std. dev. of 1000 runs, 1,000 loops each)
>>> 
... %%timeit -r 1000 -n 1000
... result = list(filter(even, data))
921 ns ± 112 ns per loop (mean ± std. dev. of 1000 runs, 1,000 loops each)

9.7.6. Use Case - 1

>>> users = [
...     {'age': 41, 'username': 'mwatney'},
...     {'age': 40, 'username': 'mlewis'},
...     {'age': 39, 'username': 'rmartinez'},
...     {'age': 40, 'username': 'avogel'},
...     {'age': 29, 'username': 'bjohanssen'},
...     {'age': 36, 'username': 'cbeck'},
... ]
>>> def above40(user):
...     return user['age'] >= 40
>>>
>>> def under40(user):
...     return user['age'] < 40
>>> result = filter(above40, users)
>>> list(result)  
[{'age': 41, 'username': 'mwatney'},
 {'age': 40, 'username': 'mlewis'},
 {'age': 40, 'username': 'avogel'}]
>>> result = filter(under40, users)
>>> list(result)  
[{'age': 39, 'username': 'rmartinez'},
 {'age': 29, 'username': 'bjohanssen'},
 {'age': 36, 'username': 'cbeck'}]

9.7.7. Use Case - 2

>>> users = [
...     {'is_admin': False, 'name': 'Mark Watney'},
...     {'is_admin': True,  'name': 'Melissa Lewis'},
...     {'is_admin': False, 'name': 'Rick Martinez'},
...     {'is_admin': False, 'name': 'Alex Vogel'},
...     {'is_admin': True,  'name': 'Beth Johanssen'},
...     {'is_admin': False, 'name': 'Chris Beck'},
... ]
>>>
>>>
>>> def admin(user):
...     return user['is_admin'] is True
>>>
>>>
>>> result = filter(admin, users)
>>> list(result)  
[{'is_admin': True, 'name': 'Melissa Lewis'},
 {'is_admin': True, 'name': 'Beth Johanssen'}]

9.7.8. Use Case - 3

>>> users = [
...     'mwatney',
...     'mlewis',
...     'rmartinez',
...     'avogel',
...     'bjohanssen',
...     'cbeck',
... ]
>>>
>>> admins = [
...     'mlewis',
...     'bjohanssen',
... ]
>>>
>>>
>>> def is_admin(user):
...     return user in admins
>>>
>>>
>>> result = filter(is_admin, users)
>>> list(result)
['mlewis', 'bjohanssen']

9.7.9. Use Case - 4

>>> class User:
...     firstname: str
...     lastname: str
...     groups: list[str]
...
...     def __init__(self, firstname, lastname, groups):
...         self.firstname = firstname
...         self.lastname = lastname
...         self.groups = groups
...
...     def __repr__(self):
...         return f'{self.firstname}'
...
>>> DATABASE = [
...     User('Mark', 'Watney', groups=['user', 'staff']),
...     User('Melissa', 'Lewis', groups=['user', 'staff', 'admin']),
...     User('Rick', 'Martinez', groups=['user', 'staff']),
...     User('Alex', 'Vogel', groups=['user']),
...     User('Beth', 'Johanssen', groups=['user', 'staff', 'admin']),
...     User('Chris', 'Beck', groups=['user', 'staff']),
... ]
>>> def is_user(user: User) -> bool:
...     return 'user' in user.groups
>>>
>>> def is_staff(user: User) -> bool:
...     return 'staff' in user.groups
>>>
>>> def is_admin(user: User) -> bool:
...     return 'admin' in user.groups
>>> users = filter(is_user, DATABASE)
>>> staff = filter(is_staff, DATABASE)
>>> admins = filter(is_admin, DATABASE)
>>> list(users)
[Mark, Melissa, Rick, Alex, Beth, Chris]
>>>
>>> list(staff)
[Mark, Melissa, Rick, Beth, Chris]
>>>
>>> list(admins)
[Melissa, Beth]

9.7.10. Assignments

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Iterator Filter Apply
# - Difficulty: easy
# - Lines: 3
# - Minutes: 2

# %% English
# 1. Define function `odd()`:
#    - takes one argument
#    - returns True if argument is odd
#    - returns False if argument is even
# 2. Use `filter()` to apply function `odd()` to DATA
# 3. Define `result: filter` with result
# 4. Run doctests - all must succeed

# %% Polish
# 1. Zdefiniuj funckję `odd()`:
#    - przyjmuje jeden argument
#    - zwraca True jeżeli argument jest nieparzysty
#    - zwraca False jeżeli argument jest parzysty
# 2. Użyj `filter()` aby zaaplikować funkcję `odd()` do DATA
# 3. Zdefiniuj `result: filter` z wynikiem
# 4. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `filter()`
# - `%` - modulo operator

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> from inspect import isfunction
>>> assert isfunction(odd), \
'Object `odd` must be a function'

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'

>>> assert type(result) is filter, \
'Variable `result` has invalid type, should be filter'

>>> result = list(result)
>>> assert type(result) is list, \
'Evaluated `result` has invalid type, should be list'

>>> assert all(type(x) is int for x in result), \
'All rows in `result` should be int'

>>> from pprint import pprint
>>> pprint(result, width=72, sort_dicts=False)
[1, 3, 5, 7, 9]
"""

DATA = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


# Returns if number is odd (modulo divisible by 2 without reminder)
# type: Callable[[int], bool]
def odd(x):
    ...

# Filter odd numbers in DATA
# type: filter
result = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Iterator Filter Apply
# - Difficulty: easy
# - Lines: 7
# - Minutes: 5

# %% English
# 1. Filter-out lines from `DATA` when:
#    - line is empty
#    - line has only spaces
#    - starts with # (comment)
# 2. Use `filter()` to apply function `valid()` to DATA
# 3. Define `result: filter` with result
# 4. Run doctests - all must succeed

# %% Polish
# 1. Odfiltruj linie z `DATA` gdy:
#    - linia jest pusta
#    - linia ma tylko spacje
#    - zaczyna się od # (komentarz)
# 2. Użyj `filter()` aby zaaplikować funkcję `valid()` do DATA
# 3. Zdefiniuj `result: filter` z wynikiem
# 4. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `filter()`
# - `str.splitlines()`
# - `str.startswith()`
# - `len()`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> from inspect import isfunction
>>> assert isfunction(valid), \
'Object `valid` must be a function'

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'

>>> assert type(result) is filter, \
'Variable `result` has invalid type, should be filter'

>>> result = list(result)
>>> assert type(result) is list, \
'Evaluated `result` has invalid type, should be list'

>>> assert all(type(x) is str for x in result), \
'All rows in `result` should be str'

>>> from pprint import pprint
>>> pprint(result, width=72, sort_dicts=False)
['127.0.0.1       localhost',
 '127.0.0.1       astromatt',
 '10.13.37.1      nasa.gov esa.int',
 '255.255.255.255 broadcasthost',
 '::1             localhost']
"""

DATA = """##
# `/etc/hosts` structure:
#    - ip: internet protocol address (IPv4 or IPv6)
#    - hosts: host names
##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost"""

# Filter-out lines from `DATA` when:
# - line is empty
# - line has only spaces
# - starts with # (comment)
# type: Callable[[str], bool]
def valid(line):
    ...

# Use `filter()` to apply function `valid()` to DATA
# type: filter
result = ...


# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: Iterator Filter Apply
# - Difficulty: easy
# - Lines: 3
# - Minutes: 5

# %% English
# 1. Filter-out non-numeric (int or float) values from `DATA`
# 2. Define `result: filter` with result
# 3. Run doctests - all must succeed

# %% Polish
# 1. Odfiltruj nie numeryczne (int lub float) wartości z `DATA`
# 2. Zdefiniuj `result: filter` z wynikiem
# 3. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `filter()`
# - `isinstance()`
# - `type()`

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> from inspect import isfunction

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'

>>> assert type(result) is filter, \
'Variable `result` has invalid type, should be filter'

>>> result = list(result)
>>> assert type(result) is list, \
'Evaluated `result` has invalid type, should be list'

>>> assert all(type(x) in (int,float) for x in result), \
'All rows in `result` should be str'

>>> from pprint import pprint
>>> pprint(result, width=72, sort_dicts=False)
[0, 2.0, 4, 5.0]
"""

DATA = [0, True, 2.0, 'three', 4, 5.0, ['six']]

# Filter-out non-numeric (int or float) values from `DATA`
# type: filter
result = ...