4.9. Series NA
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-na
Experimental: the behaviour of pd.NA can still change without warning.
None
float('nan')
np.nan
pd.NA
4.9.1. SetUp
>>> import pandas as pd
>>> import numpy as np
4.9.2. Boolean Value
>>> bool(None)
False
>>> bool(float('nan'))
True
>>> bool(np.nan)
True
>>> bool(pd.NA)
Traceback (most recent call last):
TypeError: boolean value of NA is ambiguous
4.9.3. Type
>>> pd.Series([1, None, 3]).dtype
dtype('float64')
>>> pd.Series([1.0, None, 3.0]).dtype
dtype('float64')
>>> pd.Series([True, None, False]).dtype
dtype('O')
>>> pd.Series(['a', None, 'c']).dtype
dtype('O')
>>> pd.Series([1, float('nan'), 3]).dtype
dtype('float64')
>>> pd.Series([1.0, float('nan'), 3.0]).dtype
dtype('float64')
>>> pd.Series([True, float('nan'), False]).dtype
dtype('O')
>>> pd.Series(['a', float('nan'), 'c']).dtype
dtype('O')
>>> pd.Series([1, np.nan, 3]).dtype
dtype('float64')
>>> pd.Series([1.0, np.nan, 3.0]).dtype
dtype('float64')
>>> pd.Series([True, np.nan, False]).dtype
dtype('O')
>>> pd.Series(['a', np.nan, 'c']).dtype
dtype('O')
>>> pd.Series([1, pd.NA, 3]).dtype
dtype('O')
>>> pd.Series([1.0, pd.NA, 3.0]).dtype
dtype('O')
>>> pd.Series([True, pd.NA, False]).dtype
dtype('O')
>>> pd.Series(['a', pd.NA, 'c']).dtype
dtype('O')
4.9.4. Comparison
>>> None == None
True
>>> None == float('nan')
False
>>> None == np.nan
False
>>> None == pd.NA
False
>>> float('nan') == None
False
>>> float('nan') == float('nan')
False
>>> float('nan') == np.nan
False
>>> float('nan') == pd.NA
<NA>
>>> np.nan == None
False
>>> np.nan == float('nan')
False
>>> np.nan == np.nan
False
>>> np.nan == pd.NA
<NA>
>>> pd.NA == None
False
>>> pd.NA == float('nan')
<NA>
>>> pd.NA == np.nan
<NA>
>>> pd.NA == pd.NA
<NA>
4.9.5. Identity
>>> None is None
True
>>> None is float('nan')
False
>>> None is np.nan
False
>>> None is pd.NA
False
>>> float('nan') is None
False
>>> float('nan') is float('nan')
False
>>> float('nan') is np.nan
False
>>> float('nan') is pd.NA
False
>>> np.nan is None
False
>>> np.nan is float('nan')
False
>>> np.nan is np.nan
True
>>> np.nan is pd.NA
False
>>> pd.NA is None
False
>>> pd.NA is float('nan')
False
>>> pd.NA is np.nan
False
>>> pd.NA is pd.NA
True
4.9.6. Check
Negated
~
versions of all above methods
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>> s.any()
np.True_
>>> ~s.any()
np.False_
>>> s.all()
np.True_
>>> ~s.all()
np.False_
4.9.7. Select
s.isnull()
ands.notnull()
s.isna()
ands.notna()
Negated
~
versions of all above methods
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>> s.isnull()
0 False
1 True
2 False
dtype: bool
>>> ~s.isnull()
0 True
1 False
2 True
dtype: bool
>>> s.notnull()
0 True
1 False
2 True
dtype: bool
>>> ~s.notnull()
0 False
1 True
2 False
dtype: bool
>>> s = pd.Series([1.0, np.nan, 3.0])
>>> s
0 1.0
1 NaN
2 3.0
dtype: float64
>>>
>>> s.isna()
0 False
1 True
2 False
dtype: bool
>>>
>>> s.notna()
0 True
1 False
2 True
dtype: bool
>>>
>>> ~s.isna()
0 True
1 False
2 True
dtype: bool
>>>
>>> ~s.notna()
0 False
1 True
2 False
dtype: bool
4.9.8. Update
Works with
inplace=True
parameter.
>>> s = pd.Series([1.0, None, None, 4.0, None, 6.0])
>>> s
0 1.0
1 NaN
2 NaN
3 4.0
4 NaN
5 6.0
dtype: float64
Fill NA - Scalar value:
>>> s.fillna(0.0)
0 1.0
1 0.0
2 0.0
3 4.0
4 0.0
5 6.0
dtype: float64
Forward Fill. ffill
: propagate last valid observation forward:
>>> s.ffill()
0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
5 6.0
dtype: float64
Backward Fill. bfill
: use NEXT valid observation to fill gap:
>>> s.bfill()
0 1.0
1 4.0
2 4.0
3 4.0
4 6.0
5 6.0
dtype: float64
Interpolate. method: str
, default linear
, no inplace=True
option:
>>> s.interpolate()
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
dtype: float64
Following method requires installation of scipy
library:
>>> s.interpolate('nearest')
0 1.0
1 1.0
2 4.0
3 4.0
4 4.0
5 6.0
dtype: float64
Following method requires installation of scipy
library:
>>> s.interpolate('polynomial', order=2)
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
dtype: float64
Method |
Description |
---|---|
|
Ignore the index and treat the values as equally spaced. This is the only method supported on MultiIndexes |
|
Works on daily and higher resolution data to interpolate given length of interval |
|
use the actual numerical values of the index. |
|
Fill in NA using existing values |
|
Passed to |
|
Wrappers around the SciPy interpolation methods of similar names |
|
Refers to |
4.9.9. Drop
Drop Rows. Has inplace=True
parameter:
>>> s = pd.Series([1.0, None, None, 4.0, None, 6.0])
>>> s
0 1.0
1 NaN
2 NaN
3 4.0
4 NaN
5 6.0
dtype: float64
>>>
>>> s.dropna()
0 1.0
3 4.0
5 6.0
dtype: float64
4.9.10. Conversion
If you have a
DataFrame
orSeries
using traditional types that have missing data represented usingnp.nan
There are convenience methods
convert_dtypes()
inSeries
andDataFrame
that can convert data to use the newer dtypes for integers, strings and booleansThis is especially helpful after reading in data sets when letting the readers such as
read_csv()
andread_excel()
infer default dtypes.
>>>
... data = pd.read_csv('data/baseball.csv', index_col='id')
... data[data.columns[:10]].dtypes
player object
year int64
stint int64
team object
lg object
g int64
ab int64
r int64
h int64
X2b int64
dtype: object
>>>
... data = pd.read_csv('data/baseball.csv', index_col='id')
... data = data.convert_dtypes()
... data[data.columns[:10]].dtypes
player string
year Int64
stint Int64
team string
lg string
g Int64
ab Int64
r Int64
h Int64
X2b Int64
dtype: object
4.9.11. Assignments
# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author
# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`
# %% About
# - Name: Series NA
# - Difficulty: easy
# - Lines: 4
# - Minutes: 3
# %% English
# 1. From input data create `pd.Series`
# 2. Fill only first missing value with zero
# 3. Drop missing values
# 4. Reindex series (without old copy)
# 5. Run doctests - all must succeed
# %% Polish
# 1. Z danych wejściowych stwórz `pd.Series`
# 2. Wypełnij tylko pierwszą brakującą wartość zerem
# 3. Usuń brakujące wartości
# 4. Zresetuj indeks (bez kopii starego)
# 5. Uruchom doctesty - wszystkie muszą się powieść
# %% Hints
# - `pd.Series.fillna(limit)`
# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'
>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.Series, \
'Variable `result` has invalid type, should be `pd.Series`'
>>> result
0 1.0
1 0.0
2 5.0
3 1.0
4 2.0
5 1.0
dtype: float64
"""
import pandas as pd
DATA = [1, None, 5, None, 1, 2, 1]
# type: pd.Series
result = ...