4.8. Series Slice

4.8.1. SetUp

>>> import pandas as pd

4.8.2. Numeric Index

  • Series[] is used to slice the series

  • Series.iloc[] can be used to slice the series using numeric index

  • Using numeric index upper bound is exclusive!

  • Numeric indexes has also string index underneath

SetUp:

>>> s = pd.Series(
...     data=[1.0, 2.0, 3.0, 4.0, 5.0],
...     index=[0, 1, 2, 3, 4])
... )
>>>
>>> s
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
dtype: float64

First two elements:

>>> s[:2]
0    1.0
1    2.0
dtype: float64

Last two elements:

>>> s[2:]
2    3.0
3    4.0
4    5.0
dtype: float64

All (starting from 1), but two last elements:

>>> s[1:-2]
1    2.0
2    3.0
dtype: float64

Every second element:

>>> s[::2]
0    1.0
2    3.0
4    5.0
dtype: float64

Every second element starting from the second (element with index 1, mind, that computers starts counting with 0):

>>> s[1::2]
1    2.0
3    4.0
dtype: float64

4.8.3. String Index

  • Series[] is used to slice the series

  • Series.loc[] can be used to slice the series using string index

  • Using string index upper and lower bound are inclusive!

  • String indexes has also numeric index underneath

>>> s = pd.Series(
...     data=[1.0, 2.0, 3.0, 4.0, 5.0],
...     index=['a', 'b', 'c', 'd', 'e'],
... )
>>>
>>> s
a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
dtype: float64
>>> s['a':'d']
a    1.0
b    2.0
c    3.0
d    4.0
dtype: float64
>>> s['a':'d':2]
a    1.0
c    3.0
dtype: float64
>>> s['a':'d':'b']
Traceback (most recent call last):
TypeError: '>=' not supported between instances of 'str' and 'int'
>>> s['d':'a']
Series([], dtype: float64)
>>> s[:2]
a    1.0
b    2.0
dtype: float64
>>> s[2:]
c    3.0
d    4.0
e    5.0
dtype: float64
>>> s[1:-2]
b    2.0
c    3.0
dtype: float64
>>> s[::2]
a    1.0
c    3.0
e    5.0
dtype: float64
>>> s[1::2]
b    2.0
d    4.0
dtype: float64
>>> s = pd.Series(
...     data = [1.0, 2.0, 3.0, 4.0, 5.0],
...     index = ['aaa', 'bbb', 'ccc', 'ddd', 'eee'])
>>>
>>> s
aaa    1.0
bbb    2.0
ccc    3.0
ddd    4.0
eee    5.0
dtype: float64
>>>
>>> s['a':'b']
aaa    1.0
dtype: float64
>>>
>>> s['a':'c']
aaa    1.0
bbb    2.0
dtype: float64

4.8.4. Date Index

  • Series[] can be used to slice the series using date index

  • Series.loc[] can be used to slice the series using date index

  • Using date index upper and lower bound are inclusive!

  • Date indexes has also numeric index underneath

>>> s = pd.Series(
...     data = [1.0, 2.0, 3.0, 4.0, 5.0],
...     index = pd.date_range('1999-12-30', periods=5))
>>>
>>> s
1999-12-30    1.0
1999-12-31    2.0
2000-01-01    3.0
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s['2000-01-02':'2000-01-04']
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s['1999-12-30':'2000-01-04':2]
1999-12-30    1.0
2000-01-01    3.0
2000-01-03    5.0
Freq: 2D, dtype: float64
>>> s['1999-12-30':'2000-01-04':-1]
Series([], Freq: -1D, dtype: float64)
>>> s['2000-01-04':'1999-12-30':-1]
2000-01-03    5.0
2000-01-02    4.0
2000-01-01    3.0
1999-12-31    2.0
1999-12-30    1.0
Freq: -1D, dtype: float64
>>> s[:'1999']
1999-12-30    1.0
1999-12-31    2.0
Freq: D, dtype: float64
>>> s['2000':]
2000-01-01    3.0
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s[:'1999-12']
1999-12-30    1.0
1999-12-31    2.0
Freq: D, dtype: float64
>>> s['2000-01':]
2000-01-01    3.0
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s[:'2000-01-02']
1999-12-30    1.0
1999-12-31    2.0
2000-01-01    3.0
2000-01-02    4.0
Freq: D, dtype: float64
>>> s['2000-01-02':]
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s['1999-12':'1999-12']
1999-12-30    1.0
1999-12-31    2.0
Freq: D, dtype: float64
>>> s['2000-01':'2000-01-05']
2000-01-01    3.0
2000-01-02    4.0
2000-01-03    5.0
Freq: D, dtype: float64
>>> s[:'2000-01-05':2]
1999-12-30    1.0
2000-01-01    3.0
2000-01-03    5.0
Freq: 2D, dtype: float64
>>> s[:'2000-01-03':-1]
2000-01-03    5.0
Freq: -1D, dtype: float64

Despite DatetimeIndex, this series also has RangeIndex underneath, which you can slice.

>>> s[1:3]
1999-12-31    2.0
2000-01-01    3.0
Freq: D, dtype: float64
>>>
>>> s[:3]
1999-12-30    1.0
1999-12-31    2.0
2000-01-01    3.0
Freq: D, dtype: float64
>>>
>>> s[:3:2]
1999-12-30    1.0
2000-01-01    3.0
Freq: 2D, dtype: float64
>>>
>>> s[::-1]
2000-01-03    5.0
2000-01-02    4.0
2000-01-01    3.0
1999-12-31    2.0
1999-12-30    1.0
Freq: -1D, dtype: float64

4.8.5. Assignments

Code 4.76. Solution
"""
* Assignment: Series Slice Datetime
* Complexity: easy
* Lines of code: 1 lines
* Time: 3 min

English:
    1. Given is `s: pd.Series` with dates since 2000
    2. Define `result: pd.Series` with values for dates between 2000-02-14 and end of February 2000
    3. Run doctests - all must succeed

Polish:
    1. Dany jest `s: pd.Series` z datami od 2000 roku
    2. Zdefiniuj `result: pd.Series` z wartościami pomiędzy datami od 2000-02-14 do końca lutego 2000
    3. Uruchom doctesty - wszystkie muszą się powieść

Run:
    * PyCharm: right-click in the editor and pick `Run Doctest in myfile`
    * PyCharm: `Control + Shift + R`
    * Terminal: `python -m doctest -v myfile.py`

Hints:
    * `pd.Series.loc[]`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> pd.set_option('display.width', 500)
    >>> pd.set_option('display.max_columns', 10)
    >>> pd.set_option('display.max_rows', 10)

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is pd.Series, \
    'Variable `result` has invalid type, should be `pd.Series`'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    2000-02-14   -0.509652
    2000-02-15   -0.438074
    2000-02-16   -1.252795
    2000-02-17    0.777490
    2000-02-18   -1.613898
                    ...
    2000-02-25    0.428332
    2000-02-26    0.066517
    2000-02-27    0.302472
    2000-02-28   -0.634322
    2000-02-29   -0.362741
    Freq: D, Length: 16, dtype: float64
"""

import pandas as pd
import numpy as np
np.random.seed(0)


s = pd.Series(
    data=np.random.randn(100),
    index=pd.date_range('2000-01-01', freq='D', periods=100))

# Define `result: pd.Series` with values for
# dates between 2000-02-14 and end of February 2000
# type: pd.Series
result = ...

Code 4.77. Solution
"""
* Assignment: Slicing Slice Str
* Complexity: easy
* Lines of code: 2 lines
* Time: 5 min

English:
    1. Find middle element `s: pd.Series`
    2. Slice from series 5 elements:
        a. two elements before middle
        b. one middle element
        c. two elements after middle
    3. Run doctests - all must succeed

Polish:
    1. Znajdź środkowy element `s: pd.Series`
    2. Wytnij z serii 5 elementów:
        a. dwa elementy przed środkowym
        b. jeden środkowy element
        c. dwa elementy za środkowym
    3. Uruchom doctesty - wszystkie muszą się powieść

Run:
    * PyCharm: right-click in the editor and pick `Run Doctest in myfile`
    * PyCharm: `Control + Shift + R`
    * Terminal: `python -m doctest -v myfile.py`

Hints:
    * `pd.Series.iloc[]`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is pd.Series, \
    'Variable `result` has invalid type, should be `pd.Series`'

    >>> result
    j    97
    k    80
    l    98
    m    98
    n    22
    o    68
    p    75
    dtype: int64
"""

import pandas as pd
import numpy as np
np.random.seed(0)


s = pd.Series(
    data=np.random.randint(10, 100, size=26),
    index=['a', 'b', 'c', 'd', 'e', 'f', 'g',
           'h', 'i', 'j', 'k', 'l', 'm', 'n',
           'o', 'p', 'q', 'r', 's', 't', 'u',
           'v', 'w', 'x', 'y', 'z']
)


# type: pd.Series
result = ...