5.5. DataFrame Getitem
.at[]
- takestuple[str,str]
as argument.loc[]
- takestuple[str,str]
as argument.iat[]
- takestuple[int,int]
as argument.iloc[]
- takestuple[int,int]
as argument
5.5.1. SetUp
>>> import pandas as pd
>>> import numpy as np
>>> np.random.seed(0)
>>>
>>>
>>> df = pd.DataFrame(
... columns = ['Morning', 'Noon', 'Evening', 'Midnight'],
... index = pd.date_range('1999-12-30', periods=7),
... data = np.random.randn(7, 4))
>>>
>>> df
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
5.5.2. Columns
Single Column:
>>> df.Morning
1999-12-30 1.764052
1999-12-31 1.867558
2000-01-01 -0.103219
2000-01-02 0.761038
2000-01-03 1.494079
2000-01-04 -2.552990
2000-01-05 2.269755
Freq: D, Name: Morning, dtype: float64
>>> df['Morning']
1999-12-30 1.764052
1999-12-31 1.867558
2000-01-01 -0.103219
2000-01-02 0.761038
2000-01-03 1.494079
2000-01-04 -2.552990
2000-01-05 2.269755
Freq: D, Name: Morning, dtype: float64
>>> df.loc[:, 'Morning']
1999-12-30 1.764052
1999-12-31 1.867558
2000-01-01 -0.103219
2000-01-02 0.761038
2000-01-03 1.494079
2000-01-04 -2.552990
2000-01-05 2.269755
Freq: D, Name: Morning, dtype: float64
>>> df.iloc[:, 0]
1999-12-30 1.764052
1999-12-31 1.867558
2000-01-01 -0.103219
2000-01-02 0.761038
2000-01-03 1.494079
2000-01-04 -2.552990
2000-01-05 2.269755
Freq: D, Name: Morning, dtype: float64
Multiple columns:
>>> df[['Morning', 'Evening']]
Morning Evening
1999-12-30 1.764052 0.978738
1999-12-31 1.867558 0.950088
2000-01-01 -0.103219 0.144044
2000-01-02 0.761038 0.443863
2000-01-03 1.494079 0.313068
2000-01-04 -2.552990 0.864436
2000-01-05 2.269755 0.045759
5.5.3. Rows
df['2000-01-05']
will imply to take column with name 2000-01-05
,
hence KeyError
:
>>> df['2000-01-05']
Traceback (most recent call last):
KeyError: '2000-01-05'
>>> df.loc['2000-01-05']
Morning 2.269755
Noon -1.454366
Evening 0.045759
Midnight -0.187184
Name: 2000-01-05 00:00:00, dtype: float64
>>> df.loc['2000-01']
Morning Noon Evening Midnight
2000-01-01 -0.103219 0.410599 0.144044 1.454274
2000-01-02 0.761038 0.121675 0.443863 0.333674
2000-01-03 1.494079 -0.205158 0.313068 -0.854096
2000-01-04 -2.552990 0.653619 0.864436 -0.742165
2000-01-05 2.269755 -1.454366 0.045759 -0.187184
>>> df.loc['1999']
Morning Noon Evening Midnight
1999-12-30 1.764052 0.400157 0.978738 2.240893
1999-12-31 1.867558 -0.977278 0.950088 -0.151357
5.5.4. Columns by Index
>>> df.iloc[:, 1]
1999-12-30 0.400157
1999-12-31 -0.977278
2000-01-01 0.410599
2000-01-02 0.121675
2000-01-03 -0.205158
2000-01-04 0.653619
2000-01-05 -1.454366
Freq: D, Name: Noon, dtype: float64
>>> df.iloc[:, [1,2]]
Noon Evening
1999-12-30 0.400157 0.978738
1999-12-31 -0.977278 0.950088
2000-01-01 0.410599 0.144044
2000-01-02 0.121675 0.443863
2000-01-03 -0.205158 0.313068
2000-01-04 0.653619 0.864436
2000-01-05 -1.454366 0.045759