4.9. Series NA
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-na
Experimental: the behaviour of pd.NA can still change without warning.
Nonefloat('nan')np.nanpd.NA
4.9.1. SetUp
>>> import pandas as pd
>>> import numpy as np
4.9.2. Boolean Value
>>> bool(None)
False
>>> bool(float('nan'))
True
>>> bool(np.nan)
True
>>> bool(pd.NA)
Traceback (most recent call last):
TypeError: boolean value of NA is ambiguous
4.9.3. Type
>>> pd.Series([1, None, 3]).dtype
dtype('float64')
>>> pd.Series([1.0, None, 3.0]).dtype
dtype('float64')
>>> pd.Series([True, None, False]).dtype
dtype('O')
>>> pd.Series(['a', None, 'c']).dtype
dtype('O')
>>> pd.Series([1, float('nan'), 3]).dtype
dtype('float64')
>>> pd.Series([1.0, float('nan'), 3.0]).dtype
dtype('float64')
>>> pd.Series([True, float('nan'), False]).dtype
dtype('O')
>>> pd.Series(['a', float('nan'), 'c']).dtype
dtype('O')
>>> pd.Series([1, np.nan, 3]).dtype
dtype('float64')
>>> pd.Series([1.0, np.nan, 3.0]).dtype
dtype('float64')
>>> pd.Series([True, np.nan, False]).dtype
dtype('O')
>>> pd.Series(['a', np.nan, 'c']).dtype
dtype('O')
>>> pd.Series([1, pd.NA, 3]).dtype
dtype('O')
>>> pd.Series([1.0, pd.NA, 3.0]).dtype
dtype('O')
>>> pd.Series([True, pd.NA, False]).dtype
dtype('O')
>>> pd.Series(['a', pd.NA, 'c']).dtype
dtype('O')
4.9.4. Comparison
>>> None == None
True
>>> None == float('nan')
False
>>> None == np.nan
False
>>> None == pd.NA
False
>>> float('nan') == None
False
>>> float('nan') == float('nan')
False
>>> float('nan') == np.nan
False
>>> float('nan') == pd.NA
<NA>
>>> np.nan == None
False
>>> np.nan == float('nan')
False
>>> np.nan == np.nan
False
>>> np.nan == pd.NA
<NA>
>>> pd.NA == None
False
>>> pd.NA == float('nan')
<NA>
>>> pd.NA == np.nan
<NA>
>>> pd.NA == pd.NA
<NA>
4.9.5. Identity
>>> None is None
True
>>> None is float('nan')
False
>>> None is np.nan
False
>>> None is pd.NA
False
>>> float('nan') is None
False
>>> float('nan') is float('nan')
False
>>> float('nan') is np.nan
False
>>> float('nan') is pd.NA
False
>>> np.nan is None
False
>>> np.nan is float('nan')
False
>>> np.nan is np.nan
True
>>> np.nan is pd.NA
False
>>> pd.NA is None
False
>>> pd.NA is float('nan')
False
>>> pd.NA is np.nan
False
>>> pd.NA is pd.NA
True
4.9.6. Check
Negated
~versions of all above methods
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>> s.any()
np.True_
>>> ~s.any()
np.False_
>>> s.all()
np.True_
>>> ~s.all()
np.False_
4.9.7. Select
s.isnull()ands.notnull()s.isna()ands.notna()Negated
~versions of all above methods
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>> s.isnull()
0 False
1 True
2 False
dtype: bool
>>> ~s.isnull()
0 True
1 False
2 True
dtype: bool
>>> s.notnull()
0 True
1 False
2 True
dtype: bool
>>> ~s.notnull()
0 False
1 True
2 False
dtype: bool
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>>
>>> s.isna()
0 False
1 True
2 False
dtype: bool
>>>
>>> s.notna()
0 True
1 False
2 True
dtype: bool
>>>
>>> ~s.isna()
0 True
1 False
2 True
dtype: bool
>>>
>>> ~s.notna()
0 False
1 True
2 False
dtype: bool
4.9.8. Update
Works with
inplace=Trueparameter.
>>> s = pd.Series([1.0, None, None, 4.0, None, 6.0])
>>> s
0 1.0
1 NaN
2 NaN
3 4.0
4 NaN
5 6.0
dtype: float64
Fill NA - Scalar value:
>>> s.fillna(0.0)
0 1.0
1 0.0
2 0.0
3 4.0
4 0.0
5 6.0
dtype: float64
Forward Fill. ffill: propagate last valid observation forward:
>>> s.ffill()
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
5 6.0
dtype: float64
Backward Fill. bfill: use NEXT valid observation to fill gap:
>>> s.bfill()
0 1.0
1 4.0
2 4.0
3 4.0
4 6.0
5 6.0
dtype: float64
Interpolate. method: str, default linear, no inplace=True option:
>>> s.interpolate()
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
dtype: float64
Following method requires installation of scipy library:
>>> s.interpolate('nearest')
0 1.0
1 1.0
2 4.0
3 4.0
4 4.0
5 6.0
dtype: float64
Following method requires installation of scipy library:
>>> s.interpolate('polynomial', order=2)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
dtype: float64
Method |
Description |
|---|---|
|
Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes |
|
Works on daily and higher resolution data to interpolate given length of interval |
|
use the actual numerical values of the index. |
|
Fill in NA using existing values |
|
Passed to |
|
Wrappers around the SciPy interpolation methods of similar names |
|
Refers to |
4.9.9. Drop
Drop Rows. Has inplace=True parameter:
>>> s = pd.Series([1.0, None, None, 4.0, None, 6.0])
>>> s
0 1.0
1 NaN
2 NaN
3 4.0
4 NaN
5 6.0
dtype: float64
>>>
>>> s.dropna()
0 1.0
3 4.0
5 6.0
dtype: float64
4.9.10. Conversion
If you have a
DataFrameorSeriesusing traditional types that have missing data represented usingnp.nanThere are convenience methods
convert_dtypes()inSeriesandDataFramethat can convert data to use the newer dtypes for integers, strings and booleansThis is especially helpful after reading in data sets when letting the readers such as
read_csv()andread_excel()infer default dtypes.
>>>
... data = pd.read_csv('data/baseball.csv', index_col='id')
... data[data.columns[:10]].dtypes
player object
year int64
stint int64
team object
lg object
g int64
ab int64
r int64
h int64
X2b int64
dtype: object
>>>
... data = pd.read_csv('data/baseball.csv', index_col='id')
... data = data.convert_dtypes()
... data[data.columns[:10]].dtypes
player string
year Int64
stint Int64
team string
lg string
g Int64
ab Int64
r Int64
h Int64
X2b Int64
dtype: object
4.9.11. Assignments
# %% About
# - Name: Series NA
# - Difficulty: easy
# - Lines: 4
# - Minutes: 3
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% English
# 1. From input data create `pd.Series`
# 2. Fill only first missing value with zero
# 3. Drop missing values
# 4. Reindex series (without old copy)
# 5. Run doctests - all must succeed
# %% Polish
# 1. Z danych wejściowych stwórz `pd.Series`
# 2. Wypełnij tylko pierwszą brakującą wartość zerem
# 3. Usuń brakujące wartości
# 4. Zresetuj indeks (bez kopii starego)
# 5. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `pd.Series.fillna(limit)`
# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python has an is invalid version; expected: `3.9` or newer.'
>>> assert result is not Ellipsis, \
'Variable `result` has an invalid value; assign result of your program to it.'
>>> assert type(result) is pd.Series, \
'Variable `result` has an invalid type; expected: `pd.Series`.'
>>> result
0 1.0
1 0.0
2 5.0
3 1.0
4 2.0
5 1.0
dtype: float64
"""
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -f -v myfile.py`
# %% Imports
import pandas as pd
# %% Types
result: pd.Series
# %% Data
DATA = [1, None, 5, None, 1, 2, 1]
# %% Result
result = ...