Series Create

From Python sequence

  • list

  • tuple

  • set

  • frozenset

>>> data = [1,2,3]
>>> s = pd.Series(data)

From Python range

>>> data = range(0,5)
>>> s = pd.Series(data)

From Numpy ndarray

>>> data = np.arange(0,5)
>>> s = pd.Series(data)

From Date Range

  • From pd.Timestamp

  • From pd.date_range()

  • More information in Date and Time Types

>>> a = pd.Timestamp('2000-01-01')
>>> b = pd.Timestamp('2000-01-02')
>>> c = pd.Timestamp('2000-01-03')
>>>
>>> data = [a,b,c]
>>> s = pd.Series(data)
>>>
>>> s
0   2000-01-01
1   2000-01-02
2   2000-01-03
dtype: datetime64[ns]
>>> a = pd.Timestamp('2000-01-01 00:00:00.000000')
>>> b = pd.Timestamp('2000-01-01 00:00:00.000000')
>>> c = pd.Timestamp('2000-01-01 00:00:00.000000')
>>>
>>> data = [a,b,c]
>>> s = pd.Series(data)
>>> s
0   2000-01-01
1   2000-01-01
2   2000-01-01
dtype: datetime64[ns]
pd.date_range(

start=pd.Timestamp('2000-01-01'), end=pd.Timestamp('2000-01-05'),

)

pd.date_range(

start=datetime.date(2000, 1, 1), end=datetime.date(2000, 1, 5),

)

pd.date_range(

start='2000-01-01', end='2000-01-05',

)

>>> pd.date_range(start='2000-01-01', end='2000-01-05')
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05'], dtype='datetime64[ns]', freq='D')
>>> pd.date_range(start='2000-01-01', periods=5)
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04', '2000-01-05'], dtype='datetime64[ns]', freq='D')
>>> pd.date_range(start='2000-01-01', periods=5, freq='h')
DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:00:00', '2000-01-01 02:00:00', '2000-01-01 03:00:00', '2000-01-01 04:00:00'], dtype='datetime64[ns]', freq='h')
>>> pd.date_range(start='2000-01-01', periods=5, freq='W')
DatetimeIndex(['2000-01-02', '2000-01-09', '2000-01-16', '2000-01-23', '2000-01-30'], dtype='datetime64[ns]', freq='W-SUN')
>>> pd.date_range(start='2000-01-01', periods=5, freq='W-MON')
DatetimeIndex(['2000-01-03', '2000-01-10', '2000-01-17', '2000-01-24', '2000-01-31'], dtype='datetime64[ns]', freq='W-MON')

Length

>>> len(s)
5
>>>
>>> s.size
5

Series Attributes

Size

NDim

  • Number of Dimensions

Shape

Index

  • More information in Pandas Series Index

>>> s.index
RangeIndex(start=0, stop=5, step=1)
>>> s
0   2000-01-01
1   2000-01-02
2   2000-01-03
3   2000-01-04
4   2000-01-05
dtype: datetime64[ns]

Values

>>> s.values
array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000',
       '2000-01-03T00:00:00.000000000', '2000-01-04T00:00:00.000000000',
       '2000-01-05T00:00:00.000000000'], dtype='datetime64[ns]')
>>> type(s.values)
<class 'numpy.ndarray'>

Series Index

  • Range Index

  • Index

  • Object Index

  • Datetime Index

  • Timedelta Index

  • Period Index

  • Interval Index

  • Categorical Index

  • Multi Index

Deprecation

Definition

Usage

Range Index

  • Default

>>> s = pd.Series(data=['a', 'b', 'c', 'd'])
>>> s
0    a
1    b
2    c
3    d
dtype: object
>>>
>>> s.index
RangeIndex(start=0, stop=4, step=1)
>>> s = pd.Series(data=['a', 'b', 'c', 'd'])
>>>
>>> s
0    a
1    b
2    c
3    d
dtype: object
>>>
>>>
>>> s[0]
'a'
>>> s[1]
'b'
>>> s[2]
'c'
>>> s[3]
'd'
>>> s[4]
ValueError: 4 is not in range

The above exception was the direct cause of the following exception: KeyError: 4

>>> s[-1]
ValueError: -1 is not in range

The above exception was the direct cause of the following exception: KeyError: -1

Int64 Index

>>> s = pd.Series(data=['a', 'b', 'c', 'd'], index=[10,20,30,40])
>>>
>>> s
10    a
20    b
30    c
40    d
dtype: object
>>>
>>> s.index
Index([10, 20, 30, 40], dtype='int64')
>>> s[10]
'a'
>>> s[20]
'b'
>>> s[30]
'c'
>>> s[40]
'd'
>>> s[0]
KeyError: 0

The above exception was the direct cause of the following exception: KeyError: 0

>>> s[-1]
KeyError: -1

The above exception was the direct cause of the following exception: KeyError: -1

Float64 Index

String Index

  • Also has RangeIndex

  • string.ascii_lowercase

  • string.ascii_uppercase

  • string.ascii_letters

  • string.hexdigits

  • string.digits

>>> s = pd.Series(data=['a', 'b', 'c', 'd'], index=['a', 'b', 'c', 'd'])
>>> s
a    a
b    b
c    c
d    d
dtype: object
>>> s = pd.Series(data=['a', 'b', 'c', 'd'], index=['x', 'y', 'z', 'w'])
>>> s
x    a
y    b
z    c
w    d
dtype: object
>>> import string
>>>
>>> string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>>
>>> list(string.ascii_uppercase)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
>>>
>>> list(string.ascii_uppercase)[:10]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
>>> data = range(0,5)
>>> index = list(string.ascii_uppercase)[:len(data)]
>>>
>>> s = pd.Series(data, index)
>>> s
A    0
B    1
C    2
D    3
E    4
dtype: int64
>>> s = pd.Series(data=['a', 'b', 'c', 'd'], index=['x', 'y', 'z', 'w'])
>>>
>>> s
x    a
y    b
z    c
w    d
dtype: object
>>>
>>> s['x']
'a'
>>> s['y']
'b'
>>> s['z']
'c'
>>> s['w']
'd'
>>> s[0]
<ipython-input-227-c9c96910e542>:1: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  s[0]
'a'
>>> s.iloc[0]
'a'
>>> s.loc['x']
'a'

Datetime Index

  • Also has RangeIndex

  • Default is "Daily"

  • Works also with ISO time format 1970-01-01T00:00:00

  • 00:00:00 is assumed if time is not provided

>>> data = range(0,5)
>>> index = pd.date_range(start='2000-01-01', periods=len(data))
>>>
>>> s = pd.Series(data, index)
>>>
>>> s
2000-01-01    0
2000-01-02    1
2000-01-03    2
2000-01-04    3
2000-01-05    4
Freq: D, dtype: int64
>>> index = pd.date_range(start='1999-12-29', periods=len(data))
>>> s = pd.Series(data, index)
>>>
>>> s
1999-12-29    0
1999-12-30    1
1999-12-31    2
2000-01-01    3
2000-01-02    4
Freq: D, dtype: int64
>>> s[pd.Timestamp('2000-01-01')]
3
>>> s['2000-01-01']
3
>>> s.loc['2000-01-01']
3
>>> s.iloc[3]
3

Further Reading

  • More information in Date and Time Frequency

  • More information in Date and Time Calendar

Series Sample

Tail

>>>
>>> s.tail(2)
2000-01-01    3
2000-01-02    4
Freq: D, dtype: int64

First

>>> s.first('W')
<ipython-input-240-1b4b012f8d1c>:1: FutureWarning: first is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead
  s.first('W')
1999-12-29    0
1999-12-30    1
1999-12-31    2
2000-01-01    3
2000-01-02    4
Freq: D, dtype: int64
>>>
>>> s.loc('W')
KeyError: 'W'

During handling of the above exception, another exception occurred: ValueError: No axis named W for object type Series

>>> s.loc['W']
DateParseError: Unknown datetime string format, unable to parse: W

The above exception was the direct cause of the following exception: KeyError: 'W'

>>>
>>>
>>> s.first('2W')
<ipython-input-243-8fcb952cde6c>:1: FutureWarning: first is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead
  s.first('2W')
1999-12-29    0
1999-12-30    1
1999-12-31    2
2000-01-01    3
2000-01-02    4
Freq: D, dtype: int64
>>>
>>> s.first('3D')
<ipython-input-244-9b0809069242>:1: FutureWarning: first is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead
  s.first('3D')
1999-12-29    0
1999-12-30    1
1999-12-31    2
Freq: D, dtype: int64

Last

Sample

  • 1/4 is 25%

  • .05 is 5%

  • 0.5 is 50%

  • 1.0 is 100%

Reset Index

df.sample(frac=1.0).reset_index(drop=True)

Series Getitem

>>> s = pd.Series(['a', 'b', 'c', 'd'])
>>>
>>> s
0    a
1    b
2    c
3    d
dtype: object
>>>
>>> s[0]
'a'
>>>
>>> s[-1]
ValueError: -1 is not in range

The above exception was the direct cause of the following exception: KeyError: -1

>>> s.iloc[-1]
'd'
>>>
>>> s.loc[-1]
ValueError: -1 is not in range

The above exception was the direct cause of the following exception: KeyError: -1

>>>
>>> s.loc[0]
'a'
>>> s.iloc[0]
'a'
>>> data = [1,2,3,4]
>>> index = ['a', 'b', 'c', 'd']
>>>
>>> s = pd.Series(data, index)
>>>
>>> s
a    1
b    2
c    3
d    4
dtype: int64
>>>
>>>
>>> s['a']
1
>>>
>>> s['-a']
KeyError: '-a'

The above exception was the direct cause of the following exception: KeyError: '-a'

>>> data = [1,2,3,4]
>>> index = ['a', 'b', 'c', 'd']
>>>
>>> s = pd.Series(data, index)
>>>
>>> s
a    1
b    2
c    3
d    4
dtype: int64
>>>
>>>
>>> s['a']
1
>>>
>>> s['-a']
KeyError: '-a'

The above exception was the direct cause of the following exception: KeyError: '-a'

>>>
>>> s.loc['a']
1
>>> s.loc['b']
2
>>> s.loc['c']
3
>>> s.loc['d']
4
>>>
>>> s.iloc[0]
1
>>> s.iloc[1]
2
>>> s.iloc[2]
3
>>> s.iloc[3]
4
>>> s.iloc[4]
IndexError: single positional indexer is out-of-bounds
>>> s.iloc[-1]
4
>>> s.iloc[-2]
3
>>> s.iloc[-3]
2
>>> s.iloc[-4]
1
>>> s.iloc[-5]
IndexError: single positional indexer is out-of-bounds

Range Index

Float and Int Index

String Index

Date Index

Series Slice

Numeric Index

>>> data = [1,2,3,4]
>>> index = ['a', 'b', 'c', 'd']
>>> s = pd.Series(data, index)
>>> s
a    1
b    2
c    3
d    4
dtype: int64
>>> s[0:3]
a    1
b    2
c    3
dtype: int64
>>> s.iloc[0:3]
a    1
b    2
c    3
dtype: int64
>>> s['a':'c']
a    1
b    2
c    3
dtype: int64
>>> s.loc['a':'c']
a    1
b    2
c    3
dtype: int64
>>> s.loc['a':'c':2]
a    1
c    3
dtype: int64

String Index

  • Using string index upper and lower bound are inclusive!

  • String indexes has also numeric index underneath

>>> data = [1,2,3,4]
>>> index = ['x', 'a', 'y', 'b']
>>> s = pd.Series(data, index)
>>> s['x':'b']
x    1
a    2
y    3
b    4
dtype: int64
>>> s.loc['x':'b']
x    1
a    2
y    3
b    4
dtype: int64
>>> s.iloc[0:4]
x    1
a    2
y    3
b    4
dtype: int64

Date Index

>>> s = pd.Series(
...     data = [1, 2, 3, 4, 5, 6, 7],
...     index = pd.date_range(start='1999-12-28', periods=7))
>>>
>>> s
1999-12-28    1
1999-12-29    2
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s['2000-01-01']
5
>>> s['2000-01']
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s['2000']
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s['1999-12-30':'2000-01-02']
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
Freq: D, dtype: int64
>>>
>>> s['1999-12-30':'2000-01']
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>>
>>> s['1999-12-30':'2000']
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s['1999-12':'2000']
1999-12-28    1
1999-12-29    2
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s['1999':'2000']
1999-12-28    1
1999-12-29    2
1999-12-30    3
1999-12-31    4
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s.loc['2000']
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s.iloc[-1]
7
>>> s.iloc[-3:]
2000-01-01    5
2000-01-02    6
2000-01-03    7
Freq: D, dtype: int64
>>> s.head(3)
1999-12-28    1
1999-12-29    2
1999-12-30    3
Freq: D, dtype: int64
>>> s.head(n=3)
1999-12-28    1
1999-12-29    2
1999-12-30    3
Freq: D, dtype: int64

Series NA

Boolean Value

Type

Comparison

Identity

Check

  • Negated ~ versions of all above methods

Select

  • s.isnull() and s.notnull()

  • s.isna() and s.notna()

  • Negated ~ versions of all above methods

Update

  • Works with inplace=True parameter.

Drop

Conversion

  • If you have a DataFrame or Series using traditional types that have

>>> s = pd.Series([1,2,3,pd.NA,5,6,pd.NA,8,9], dtype='Int64')
>>>
>>> s
0       1
1       2
2       3
3    <NA>
4       5
5       6
6    <NA>
7       8
8       9
dtype: Int64
>>> s.dropna()
0    1
1    2
2    3
4    5
5    6
7    8
8    9
dtype: Int64
>>> s.fillna(0)
0    1
1    2
2    3
3    0
4    5
5    6
6    0
7    8
8    9
dtype: Int64
>>> s.ffill()
0    1
1    2
2    3
3    3
4    5
5    6
6    6
7    8
8    9
dtype: Int64
>>> s.bfill()
0    1
1    2
2    3
3    5
4    5
5    6
6    8
7    8
8    9
dtype: Int64
>>> s.interpolate()
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
8    9.0
dtype: Float64
>>> s.interpolate('quadratic')
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
8    9.0
dtype: Float64
>>> s.interpolate('polynomial', order=3)
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    6.0
6    7.0
7    8.0
8    9.0
dtype: Float64

Series Alter

Drop Rows

  • Drop element at index

  • Works with inplace=True

Drop Duplicates

  • Works with inplace=True

Reset Index

  • Works with inplace=True

  • drop=True prevents the old index being added as a column

Series Sort

>>> s.drop(1)
0       1
2       3
3    <NA>
4       5
5       6
6    <NA>
7       8
8       9
dtype: Int64
>>> s.drop([1,2,3])
0       1
4       5
5       6
6    <NA>
7       8
8       9
dtype: Int64
>>>
>>> s.drop_duplicates()
0       1
1       2
2       3
3    <NA>
4       5
5       6
7       8
8       9
dtype: Int64
>>>
>>> s.sort_vales()
AttributeError: 'Series' object has no attribute 'sort_vales'
>>> s.sort_values()
0       1
1       2
2       3
4       5
5       6
7       8
8       9
3    <NA>
6    <NA>
dtype: Int64
>>>
>>> s.sort_values(ascending=True)
0       1
1       2
2       3
4       5
5       6
7       8
8       9
3    <NA>
6    <NA>
dtype: Int64
>>> s.sort_values(ascending=False)
8       9
7       8
5       6
4       5
2       3
1       2
0       1
3    <NA>
6    <NA>
dtype: Int64
>>>
>>> s.sort_index()
0       1
1       2
2       3
3    <NA>
4       5
5       6
6    <NA>
7       8
8       9
dtype: Int64
>>> s.sort_index(ascending=False)
8       9
7       8
6    <NA>
5       6
4       5
3    <NA>
2       3
1       2
0       1
dtype: Int64

Sort Values

Sort Index

Series Arithmetic

Vectorized Operations

  • s + 2, s.add(2), s.__add__(2)

  • s - 2, s.sub(2), s.subtract(2), s.__sub__(2)

  • s * 2, s.mul(2), s.multiply(2), s.__mul__(2)

  • s ** 2, s.pow(2), s.__pow__(2)

  • s ** (1/2), s.pow(1/2), s.__sub__(1/2)

  • s / 2, s.div(2), s.divide(), s.__div__(2)

  • s // 2, s.truediv(2), s.__truediv__(2)

  • s % 2, s.mod(2), s.__mod__(2)

  • divmod(s, 2), s.divmod(2), s.__divmod__(2), (s//2, s%2)

>>> result = (s
...     .add(100)
...     .mul(20)
...     .div(10)
...     .mod(8)
... )
>>>
>>> result
0     2.0
1     4.0
2     6.0
3    <NA>
4     2.0
5     4.0
6    <NA>
7     0.0
8     2.0
dtype: Float64

Broadcasting

  • Uses inner join

  • fill_value: If data in both corresponding Series locations is

Series Statistics

Count

  • Series.count() - Number of non-null observations

Sum

  • Series.sum() - Sum of values

  • Series.cumsum() - Cumulative sum

Product

  • Series.prod() - Product of values

  • Series.cumprod() - Cumulative product

Extremes

  • Series.min() - Minimum value

  • Series.idxmin() - Index of minimum value (Float, Int, Object, Datetime, Index)

  • Series.argmin() - Range index of minimum value

  • Series.cummin() - Cumulative minimum

  • Series.max() - Maximum value

  • Series.idxmax() - Index of maximum value (Float, Int, Object, Datetime, Index)

  • Series.argmax() - Range index of maximum value

  • Series.cummax() - Cumulative maximum

Average

Distribution

Describe

Series Mapping

  • Using str methods for cleaning user input

  • 80% of machine learning and data science is cleaning data

  • Series.apply - apply function to data, function can have args and kwargs

  • Series.map - convert data from one to another using function or dict

  • DataFrame.apply

  • DataFrame.map

  • DataFrame.applymap (deprecated)

>>> s = pd.Series(['a', 'b', 'c', 'd'])
>>>
>>> s
0    a
1    b
2    c
3    d
dtype: object
>>> s.upper()
AttributeError: 'Series' object has no attribute 'upper'
>>> def upper(x: str) -> str:
...     return x.upper()
...
>>> s.map(upper)
0    A
1    B
2    C
3    D
dtype: object
>>> s = pd.Series(data=[
...     'Mark',
...     'Melissa',
...     'Rick',
... ])
>>>
>>>
>>>
>>> s
0       Mark
1    Melissa
2       Rick
dtype: object
>>>
>>>
>>> s.map(str.upper)
0       MARK
1    MELISSA
2       RICK
dtype: object
s = pd.Series(data=[

'2000-01-01', '2000-01-02', '2000-01-03',

])

>>> s = pd.Series(data=[
...     '2000-01-01',
...     '2000-01-02',
...     '2000-01-03',
... ])
>>>
>>> s.map(date.fromisoformat)
0    2000-01-01
1    2000-01-02
2    2000-01-03
dtype: object
>>>
>>>
>>> s.map(pd.Timestamp)
0   2000-01-01
1   2000-01-02
2   2000-01-03
dtype: datetime64[ns]
>>> pd.to_datetime('2000-01-01')
Timestamp('2000-01-01 00:00:00')
>>>
>>> s.map(pd.to_datetime)
0   2000-01-01
1   2000-01-02
2   2000-01-03
dtype: datetime64[ns]
>>> pd.to_datetime(s)
0   2000-01-01
1   2000-01-02
2   2000-01-03
dtype: datetime64[ns]
>>> pd.to_numeric('1.0')
1.0
>>> pd.to_numeric('1')
1
>>> x = pd.Series(['1', '2.0', '3', '4.5'])
>>>
>>> x.map(pd.to_numeric)
0    1.0
1    2.0
2    3.0
3    4.5
dtype: float64
>>> pd.to_numeric(x)
0    1.0
1    2.0
2    3.0
3    4.5
dtype: float64

Apply

>>> s.round(2)
0    1.11
1    2.22
2    3.33
3    4.44
dtype: float64
>>> s.map(round, ndigits=2)
TypeError: Series.map() got an unexpected keyword argument 'ndigits'
>>> round(1.1111, ndigits=2)
1.11
>>> s.apply(round, ndigits=2)
0    1.11
1    2.22
2    3.33
3    4.44
dtype: float64
>>> round(1.1111, 2)
1.11
>>> s.apply(round, args=(2,))
0    1.11
1    2.22
2    3.33
3    4.44
dtype: float64

Map

s = pd.read_csv('https://python3.info/_static/iris-dirty.csv', names=['sl', 'sw ... ', 'pl', 'pw', 'species'], skiprows=1)['species']

s.map({ ... 0: 'setosa', ... 1: 'virginica', ... }) 0 setosa 1 NaN 2 virginica 3 NaN 4 virginica 5 setosa 6 virginica 7 virginica

Normalization

>>> x = pd.Series(data=[
...     'Mark',
...     'Melissa',
...     'Rick',
... ])
>>>
>>> x.map(str.upper)
0       MARK
1    MELISSA
2       RICK
dtype: object
>>> x = pd.Series(data=[
...     'Mark Watney',
...     'Melissa Lewis',
...     'Rick Martinez',
... ])
>>>
>>> def gdpr(fullname):
...     firstname, lastname = fullname.split()
...     return f'{firstname} {lastname[0]}.'
...
>>> x.map(gdpr)
0       Mark W.
1    Melissa L.
2       Rick M.
dtype: object

Numbers

>>> data = pd.Series([
...     'Łukasz Watney',
...     'ąść Lewis',
...     'Rick Martinez',
... ])
>>> PL = {
...     'ą': 'a',
...     'ś': 's',
...     'ć': 'c',
...     'Ł': 'L',
... }
>>>
>>> def translate(text):
...     return ''.join(PL.get(x,x) for x in text)
...
>>> data.map(translate)
0    Lukasz Watney
1        asc Lewis
2    Rick Martinez
dtype: object

Addresses

Phone Numbers:

Date and Time