5.23. DataFrame Plotting

Important

.plot()
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html

../../_images/matplotlib-figure-anatomy.png

5.23.1. Plot kinds

line - Line Plot
bar - Vertical Bar Plot
barh - Horizontal Bar Plot
hist - Histogram
box - Boxplot
density, kde - Kernel Density Estimation Plot
area - Area Plot
pie - Pie Plot
scatter - Scatter Plot
hexbin - Hexbin Plot

5.23.2. Parameters

Table 5.3. Parameters
Parameter	Default value
x	`None`
y	`None`
kind	line
ax	`None`
subplots	`False`
sharex	`None`
sharey	`False`
layout	`None`
figsize	`None`
use_index	`True`
title	`None`
grid	`None`
legend	`True`
style	`None`
logx	`False`
logy	`False`
loglog	`False`
xticks	`None`
yticks	`None`
xlim	`None`
ylim	`None`
rot	`None`
fontsize	`None`
colormap	`None`
table	`False`
yerr	`None`
xerr	`None`
secondary_y	`False`
sort_columns	`False`
xlabel	`None`
ylabel	`None`

Table 5.4. Parameters
Parameter	Type	Default	Description
`data`	Series or DataFrame	None	The object for which the method is called
`x`	label or position	None	Only used if data is a DataFrame
`y`	label, position or list of label, positions	None	Allows plotting of one column versus another. Only used if data is a DataFrame.
`kind`	str	`line`	`line`, `bar`, `barh`, `hist`, `box`, `kde`, `density`, `area`, `pie`, `scatter`, `hexbin`
`figsize`	tuple	None	(width, height) in inches
`use_index`	bool	True	Use index as ticks for x axis
`title`	str or list	None	Title to use for the plot. If a string is passed, print the string at the top of the figure. If a list is passed and subplots is True, print each item in the list above the corresponding subplot.
`grid`	bool	None	(matlab style default) Axis grid lines
`legend`	bool or 'reverse'	None	Place legend on axis subplots
`style`	list or dict	None	matplotlib line style per column
`logx`	bool or 'sym'	False	Use log scaling or symlog scaling on x axis
`logy`	bool or 'sym'	False	Use log scaling or symlog scaling on y axis
`loglog`	bool or 'sym'	False	Use log scaling or symlog scaling on both x and y axes
`xticks`	sequence	None	Values to use for the xticks
`yticks`	sequence	None	Values to use for the yticks
`xlim`	2-tuple/list	None
`ylim`	2-tuple/list	None
`rot`	int	None	Rotation for ticks (xticks for vertical, yticks for horizontal plots)
`fontsize`	int	None	Font size for xticks and yticks
`colormap`	str or matplotlib colormap object	default None	Colormap to select colors from. If string, load colormap with that name from matplotlib.
`colorbar`	bool	None	If True, plot colorbar (only relevant for 'scatter' and 'hexbin' plots)
`position`	float	0.5 (center)	Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1 (right/top-end).
`table`	bool, Series or DataFrame	False	If True, draw a table using the data in the DataFrame and the data will be transposed to meet matplotlib's default layout. If a Series or DataFrame is passed, use passed data to draw a table.
`yerr`	DataFrame, Series, array-like, dict or str	None	Equivalent to xerr.
`xerr`	DataFrame, Series, array-like, dict or str	None	Equivalent to yerr.
`mark_right`	bool	True	When using a secondary_y axis, automatically mark the column labels with "(right)" in the legend.
`**kwds`	keywords	None	Options to pass to matplotlib plotting method.

5.23.3. SetUp

✘>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>>
>>>
>>> DATA = 'https://python3.info/_static/iris-clean.csv'
>>>
>>> df = pd.read_csv(DATA)

5.23.4. Line Plot

default

✘>>> plot = df.plot(kind='line')
>>> plt.show()

../../_images/pandas-dataframe-plot-line.png — Figure 5.15. Line Plot

✘>>> plot = df.plot(kind='line', subplots=True)
>>> plt.show()

../../_images/pandas-dataframe-plot-line-subplots.png — Figure 5.16. Line Plot with Subplots

✘>>> plot = df.plot(kind='line',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True)
>>> plt.show()

../../_images/pandas-dataframe-plot-line-layout.png — Figure 5.17. Line Plot with Subplots and Layout

5.23.5. Vertical Bar Plot

✘>>> plot = df.plot(kind='bar', subplots=True, layout=(2,2))
>>> plt.show()

../../_images/pandas-dataframe-plot-bar.png — Figure 5.18. Vertical Bar Plot

5.23.6. Horizontal Bar Plot

✘>>> plot = df.plot(kind='barh',
...                title='Iris',
...                ylabel='centimeters',
...                xlabel='iris',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True,
...                legend='upper right',
...                grid=True,
...                figsize=(10,10))
>>> plt.show()

../../_images/pandas-dataframe-plot-barh.png — Figure 5.19. Horizontal Bar Plot

5.23.7. Histogram

✘>>> plot = df.plot(kind='hist',
...                rwidth=0.8,
...                xlabel='centimeters',
...                title='Iris Dimensions Frequency')
>>> plt.show()

../../_images/pandas-dataframe-plot-hist.png — Figure 5.20. Histogram

✘>>> plot = df.plot(kind='hist',
...                rwidth=0.8,
...                xlabel='centimeters',
...                title='Iris Dimensions Frequency',
...                subplots=True,
...                layout=(2,2),
...                sharex=True,
...                sharey=True)
>>> plt.show()

../../_images/pandas-dataframe-plot-hist-layout.png — Figure 5.21. Histogram

✘>>> plot = df.hist()
>>> plt.show()

✘>>> plot = df['sepal_length'].hist(bins=3,
...                                rwidth=0.8,
...                                legend=None,
...                                grid=False)
>>>
>>> _ = plot.xaxis.set_ticks(ticks=[4.9, 6.1, 7.3],
...                          labels=['small', 'medium', 'large'])
>>> plt.show()

../../_images/pandas-dataframe-plot-hist-categories.png — Figure 5.23. Visualization using hist

5.23.8. Boxplot

✘>>> plot = df.plot(kind='box')
>>> plt.show()

../../_images/pandas-dataframe-plot-box.png — Figure 5.24. Boxplot

✘>>> plot = df.plot(kind='box',
...                subplots=True,
...                layout=(2,2),
...                sharex=False,
...                sharey=False)
>>>
>>> plt.show()

../../_images/pandas-dataframe-plot-box-layout.png — Figure 5.25. Boxplot with layout

5.23.9. Kernel Density Estimation Plot

Also known as kind='kde' - Kernel Density Estimation

✘>>> plot = df.plot(kind='density')
>>> plt.show()

../../_images/pandas-dataframe-plot-density.png — Figure 5.26. Kernel Density Estimation Plot

✘>>>
... plot = df.plot(
...     kind='density',
...     subplots=True,
...     layout=(2,2),
...     sharex=False,
... )
>>>
>>> plt.subplots_adjust(hspace=0.5, wspace=0.5)  # margins between charts
>>> plt.show()

../../_images/pandas-dataframe-plot-density-margin.png — Figure 5.27. Density plot with margins

5.23.10. Area Plot

✘>>> plot = df.plot(kind='area')
>>> plt.show()

../../_images/pandas-dataframe-plot-area.png — Figure 5.28. Area Plot

../../_images/pandas-dataframe-plot-cumulative-flow-diagram.png — Figure 5.29. Cumulative Flow Diagram in Atlassian Jira

5.23.11. Pie Plot

List of Matplotlib color names [1]

../../_images/matplotlib-colors.png — Figure 5.30. List of Matplotlib color names [1]

✘>>> data = pd.cut(df['sepal_length'],
...               bins=[3, 5, 7, np.inf],
...               labels=['small', 'medium', 'large'],
...               include_lowest=True).value_counts()
>>>
>>> plot = data.plot(kind='pie',
...                  autopct='%1.0f%%',
...                  colors=['plum', 'violet', 'magenta'],
...                  explode=[0.1, 0, 0],
...                  shadow=True,
...                  startangle=-215,
...                  xlabel=None,
...                  ylabel=None,
...                  title='sepal_length\nsmall: 0.0 to 3.0\nmedium: 3.0 to 5.0\nlarge: 7.0 to inf',
...                  figsize=(10,10))
>>>
>>> plt.show()

../../_images/pandas-dataframe-plot-pie.png — Figure 5.31. Pie Plot

5.23.12. Scatter Plot

✘>>> plot = df.plot(kind='scatter', x='sepal_length', y='sepal_width')
>>> plt.show()

../../_images/pandas-dataframe-plot-scatter-sepal.png — Figure 5.32. Scatter plot: sepal_length vs. sepal_width

✘>>> plot = df.plot(kind='scatter', x='petal_length', y='petal_width')
>>> plt.show()

../../_images/pandas-dataframe-plot-scatter-petal.png — Figure 5.33. Scatter plot: petal_length vs. petal_width

✘>>> data = df.replace({'setosa': 0,
...                    'virginica': 1,
...                    'versicolor': 2})
>>>
>>> plot = data.plot(kind='scatter',
...                  x='sepal_length',
...                  y='sepal_width',
...                  colormap='viridis',
...                  c='species')
>>> plt.show()

../../_images/pandas-dataframe-plot-scatter-viridis.png — Figure 5.34. Scatter plot using viridis colormap

5.23.13. Hexbin Plot

✘>>> plot = df.plot(kind='hexbin', x='petal_length', y='petal_width')
>>> plt.show()

../../_images/pandas-dataframe-plot-hexbin.png — Figure 5.35. Hexbin Plot

5.23.14. Scatter matrix

The in pandas version 0.22 plotting module has been moved from pandas.tools.plotting to pandas.plotting
As of version 0.19, the pandas.plotting library did not exist

✘>>> from pandas.plotting import scatter_matrix
>>>
>>> plot = scatter_matrix(df)
>>> plt.show()

../../_images/pandas-dataframe-plot-scattermatrix.png — Figure 5.36. Scatter Matrix

✘>>> data = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
>>> colors = df['species'].replace({'setosa': 0, 'virginica': 1, 'versicolor': 2})  # colors must be numerical
>>>
>>> plot = scatter_matrix(data, c=colors)
>>> plt.show()

../../_images/pandas-dataframe-plot-scattermatrix-colors.png — Figure 5.37. Scatter Matrix with colors

5.23.15. Actinograms

../../_images/pandas-dataframe-actinogram-1.png

../../_images/pandas-dataframe-actinogram-2.png

5.23.16. Further Reading

5.23.17. References

5.23.18. Assignments

# FIXME: za trudne zadanie, przenieść je do case study
# FIXME: Write solution
# FIXME: English translation

# %% About
# - Name: DataFrame Plot
# - Difficulty: medium
# - Lines: 15
# - Minutes: 21

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Read data from `DATA` as `df: pd.DataFrame`
# 2. Select `Luminance` stylesheet
# 3. Parse column with dates
# 4. Select desired date and location, then resample by hour
# 5. Display chart (line) with activity hours in "Sleeping Quarters upper" location
# 6. Active is when `Luminance` is not zero
# 7. Easy: for day 2019-09-28
# 8. Advanced: for each day, as subplots
# 9. Run doctests - all must succeed

# %% Polish
# 1. Wczytaj dane z `DATA` jako `df: pd.DataFrame`
# 2. Wybierz arkusz `Luminance`
# 3. Sparsuj kolumny z datami
# 4. Wybierz pożądaną datę i lokację, następnie próbkuj co godzinę
# 5. Aktywność jest gdy `Luminance` jest różna od zera
# 6. Wyświetl wykres (line) z godzinami aktywności w dla lokacji "Sleeping Quarters upper"
# 7. Łatwe: dla dnia 2019-09-28
# 8. Zaawansowane: dla wszystkich dni, jako subplot
# 9. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - `pd.Series.apply(np.sign)` :ref:`Numpy signum`
# - `pd.Series.resample('H').sum()`

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 10)

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.Series, \
'Variable `result` must be a `pd.Series` type'

>>> result  # doctest: +NORMALIZE_WHITESPACE
datetime
2019-09-28 00:00:00+00:00    1
2019-09-28 01:00:00+00:00    1
2019-09-28 02:00:00+00:00    1
2019-09-28 03:00:00+00:00    1
2019-09-28 04:00:00+00:00    0
                            ..
2019-09-28 19:00:00+00:00    1
2019-09-28 20:00:00+00:00    1
2019-09-28 21:00:00+00:00    1
2019-09-28 22:00:00+00:00    1
2019-09-28 23:00:00+00:00    1
Freq: h, Name: value, Length: 24, dtype: int64
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports
import numpy as np
import pandas as pd

# %% Types
result: pd.Series

# %% Data
DATA = 'https://python3.info/_static/aatc-mission-exp12.xlsx'
WHERE = 'Sleeping Quarters upper'
WHEN = '2019-09-28'

# %% Result
result = ...