5.9. DataFrame Select
df[df['Morning'] > 0.0]
~
- logical not&
- logical and|
- logical or^
- logical xor
5.9.1. SetUp
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>> df = pd.DataFrame(
... columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
... index = pd.date_range('1999-12-30', periods=7),
... data = np.random.randn(7, 4))
>>>
>>> df
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
Pandas Select Methods:
5.9.2. Query Data
df.where()
Works withinplace=True
>>> df[df['Morning'] > 0.0]
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
>>> query = df['Morning'] > 0.0
>>>
>>> df[query]
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
>>> query = df['Morning'] > 0.0
>>>
>>> df.where(query)
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 NaN NaN NaN NaN
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 NaN NaN NaN NaN
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
5.9.3. Logical NOT
>>> query = df['Midnight'] < 0.0
>>>
>>> df[~query]
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
>>>
>>> df.where(~query)
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 NaN NaN NaN NaN
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 NaN NaN NaN NaN
2000-01-04 NaN NaN NaN NaN
2000-01-05 NaN NaN NaN NaN
5.9.4. Logical AND
In first and in second query
1 & 1 -> 1
1 & 0 -> 0
0 & 1 -> 0
0 & 0 -> 0
>>> df[ (df['Morning']<0.0) & (df['Midnight']<0.0) ]
Morning Noon Evening Midnight
2000-01-04 -2.55299 0.653619 0.864436 -0.742165
>>> query = (df['Morning'] < 0.0) & (df['Midnight'] < 0.0)
>>>
>>> df[query]
Morning Noon Evening Midnight
2000-01-04 -2.55299 0.653619 0.864436 -0.742165
>>> query1 = df['Morning'] < 0.0
>>> query2 = df['Midnight'] < 0.0
>>>
>>> df[query1 & query2]
Morning Noon Evening Midnight
2000-01-04 -2.55299 0.653619 0.864436 -0.742165
>>>
>>> df.where(query1 & query2)
Morning Noon Evening Midnight
1999-12-30 NaN NaN NaN NaN
1999-12-31 NaN NaN NaN NaN
2000-01-01 NaN NaN NaN NaN
2000-01-02 NaN NaN NaN NaN
2000-01-03 NaN NaN NaN NaN
2000-01-04 -2.55299 0.653619 0.864436 -0.742165
2000-01-05 NaN NaN NaN NaN
5.9.5. Logical OR
In first or in second query
1 | 1 -> 1
1 | 0 -> 1
0 | 1 -> 1
0 | 0 -> 0
>>> query1 = df['Morning'] < 0.0
>>> query2 = df['Midnight'] < 0.0
>>>
>>> df[query1 | query2]
Morning Noon Evening Midnight
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
>>>
>>> df.where(query1 | query2)
Morning Noon Evening Midnight
1999-12-30 NaN NaN NaN NaN
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 NaN NaN NaN NaN
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
5.9.6. Logical XOR
In first or in second, but not in both queries
1 ^ 1 -> 0
1 ^ 0 -> 1
0 ^ 1 -> 1
0 ^ 0 -> 0
>>> query1 = df['Morning'] < 0.0
>>> query2 = df['Midnight'] < 0.0
>>>
>>> df[query1 ^ query2]
Morning Noon Evening Midnight
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
>>>
>>> df.where(query1 ^ query2)
Morning Noon Evening Midnight
1999-12-30 NaN NaN NaN NaN
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 NaN NaN NaN NaN
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 NaN NaN NaN NaN
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
5.9.7. Assignments
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: DataFrame Select
# - Difficulty: easy
# - Lines: 5
# - Minutes: 3
# %% English
# 1. Load data from `DATA` as `df: pd.DataFrame`
# 2. Select rows where 'petal_length' is above 2.0
# 3. Display first 5 rows
# 4. Do not use `.query()`
# 5. Run doctests - all must succeed
# %% Polish
# 1. Wczytaj dane z `DATA` jako `df: pd.DataFrame`
# 2. Wybierz wiersze, gdzie wartość 'petal_length' jest powyżej 2.0
# 3. Wyświetl 5 pierwszych wierszy
# 4. Nie używaj `.query()`
# 5. Uruchom doctesty - wszystkie muszą się powieść
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 10)
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.DataFrame, \
'Variable `result` must be a `pd.DataFrame` type'
>>> result # doctest: +NORMALIZE_WHITESPACE
sepal_length sepal_width petal_length petal_width species
1 5.9 3.0 5.1 1.8 virginica
2 6.0 3.4 4.5 1.6 versicolor
3 7.3 2.9 6.3 1.8 virginica
4 5.6 2.5 3.9 1.1 versicolor
6 5.5 2.6 4.4 1.2 versicolor
"""
import pandas as pd
DATA = 'https://python3.info/_static/iris-clean.csv'
# type: pd.DataFrame
result = ...