3.4. Math Statistics

  • statistics module

3.4.1. Mean

Table 3.6. Mean

Function

Description

statistics.mean()

Arithmetic mean ('average') of data

statistics.fmean()

faster, floating point variant of statistics.mean(), since Python 3.8

statistics.harmonic_mean()

Harmonic mean of data

statistics.geometric_mean()

since Python 3.8

Arithmetic mean ('average') of data:

from statistics import mean


mean([1, 2, 3, 4, 4])
# 2.8
mean([-1.0, 2.5, 3.25, 5.75])
# 2.625

Harmonic mean of data:

from statistics import harmonic_mean


harmonic_mean([2.5, 3, 10])
# 3.6

3.4.2. Median

Table 3.7. Median

Function

Description

statistics.median()

Median (middle value) of data

statistics.median_low()

Low median of data

statistics.median_high()

High median of data

statistics.median_grouped()

Median, or 50th percentile, of grouped data

Median (middle value) of data:

from statistics import median


median([1, 3, 5])
# 3
median([1, 3, 5, 7])
# 4.0
  • The low median is always a member of the data set.

  • When the number of data points is odd, the middle value is returned.

  • When it is even, the smaller of the two middle values is returned.

Low median of data:

from statistics import median_low


median_low([1, 3, 5])
# 3
median_low([1, 3, 5, 7])
# 3
  • The high median is always a member of the data set.

  • When the number of data points is odd, the middle value is returned.

  • When it is even, the larger of the two middle values is returned.

High median of data:

from statistics import median_high


median_high([1, 3, 5])
# 3
median_high([1, 3, 5, 7])
# 5
  • Median of grouped continuous data.

  • Calculated using interpolation as the 50th percentile.

Median, or 50th percentile, of grouped data:

from statistics import median_grouped


median_grouped([52, 52, 53, 54])
# 52.5
median_grouped([1, 3, 3, 5, 7], interval=1)
# 3.25
median_grouped([1, 3, 3, 5, 7], interval=2)
# 3.5

3.4.3. Mode

Table 3.8. Mode

Function

Description

statistics.mode()

Mode (most common value) of discrete data

statistics.multimode()

returns a list of the most common values, since Python 3.8

statistics.quantiles()

divides data or a distribution in to equiprobable intervals (e.g. quartiles, deciles, or percentiles), since Python 3.8

Mode (most common value) of discrete data:

from statistics import mode


mode([1, 1, 2, 3, 3, 3, 3, 4])
# 3
mode(["red", "blue", "blue", "red", "green", "red", "red"])
# 'red'

3.4.4. Distribution

Table 3.9. Distribution

Function

Description

statistics.NormalDist

tool for creating and manipulating normal distributions of a random variable

3.4.5. Standard Deviation

Table 3.10. Standard Deviation

Function

Description

statistics.pstdev()

Population standard deviation of data

statistics.stdev()

Sample standard deviation of data

Sample standard deviation of data:

from statistics import stdev


stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
# 1.0810874155219827
  • Population standard deviation

  • Is the square root of the population variance

Population standard deviation:

from statistics import pstdev


pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])
# 0.986893273527251

3.4.6. Variance

Table 3.11. Variance

Function

Description

statistics.pvariance()

Population variance of data

statistics.variance()

Sample variance of data

Sample variance of data:

from statistics import variance


variance([2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5])
# 1.3720238095238095

Population variance of data:

from statistics import pvariance


pvariance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25])
# 1.25

3.4.7. Examples

temperature_feb = NormalDist.from_samples([4, 12, -3, 2, 7, 14])

temperature_feb.mean
# 6.0
temperature_feb.stdev
# 6.356099432828281

# Chance of being under 3 degrees
temperature_feb.cdf(3)  # 0.3184678262814532

# Relative chance of being 7 degrees versus 10 degrees
temperature_feb.pdf(7) / temperature_feb.pdf(10)  # 1.2039930378537762


el_niño = NormalDist(4, 2.5)

# Add in a climate effect
temperature_feb += el_niño

temperature_feb
# NormalDist(mu=10.0, sigma=6.830080526611674)

# Convert to Fahrenheit
temperature_feb * (9/5) + 32
# NormalDist(mu=50.0, sigma=12.294144947901014)

# Generate random samples
temperature_feb.samples(3)
# [7.672102882379219, 12.000027119750287, 4.647488369766392]

3.4.8. Assignments

# %% About
# - Name: Math Statistics Stats
# - Difficulty: easy
# - Lines: 11
# - Minutes: 13

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. For columns:
#    - sepal_length,
#    - sepal_width,
#    - petal_length,
#    - petal_width.
# 2. Print calculated values:
#    - mean,
#    - median,
#    - standard deviation,
#    - variance.
# 3. Use `statistics` module from Python standard library
# 4. Run doctests - all must succeed

# %% Polish
# 1. Dla kolumn:
#    - sepal_length,
#    - sepal_width,
#    - petal_length,
#    - petal_width.
# 2. Wypisz wyliczone wartości:
#    - średnią,
#    - medianę,
#    - odchylenie standardowe,
#    - wariancję.
# 3. Użyj modułu `statistics` z biblioteki standardowej Python
# 4. Uruchom doctesty - wszystkie muszą się powieść

# %% Hints
# - Note, that in `petal_length` stdev is different
# - Python 3.10: 1.8602739173624534
# - Python 3.11: 1.8602739173624532


# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> stats(sepal_length)
{'mean': 5.833333333333333, 'stdev': 0.9084785816591018, 'median': 5.7, 'variance': 0.8253333333333333}
>>> stats(sepal_width)
{'mean': 3.0619047619047617, 'stdev': 0.36670995415476587, 'median': 3.0, 'variance': 0.1344761904761905}
>>> stats(petal_length)
{'mean': 3.8523809523809525, 'stdev': 1.8602739173624532, 'median': 4.5, 'variance': 3.4606190476190477}
>>> stats(petal_width)
{'mean': 1.2333333333333334, 'stdev': 0.7741662181555931, 'median': 1.4, 'variance': 0.5993333333333334}
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports
from statistics import mean, stdev, variance, median

# %% Types
from typing import Callable
stats: Callable[[list[float]], dict[str, float]]

# %% Data
DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

header, *rows = DATA
sepal_length = [row[0] for row in rows]
sepal_width = [row[1] for row in rows]
petal_length = [row[2] for row in rows]
petal_width = [row[3] for row in rows]

# %% Result
def stats(values):
    ...

# FIXME: przepisać zadanie, bo jest zbyt skomplikowane

# %% About
# - Name: Math Statistics Iris
# - Difficulty: easy
# - Lines: 30
# - Minutes: 21

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. Create dict `result: dict[str, dict]`
# 2. For each species calculate for numerical values:
#    - mean,
#    - median,
#    - standard deviation,
#    - variance.
# 3. Save data to `result` dict
# 4. Non-functional requirements:
#    - Use `statistics` module from Python standard library
# 5. Run doctests - all must succeed

# %% Polish
# 1. Stwórz słownik `result: dict[str, dict]`
# 2. Dla każdego gatunku wylicz dla wartości numerycznych:
#    - średnią,
#    - medianę,
#    - odchylenie standardowe,
#    - wariancję.
# 3. Dane zapisz w słowniku `result`
# 4. Wymagania niefunkcjonalne:
#    - Użyj modułu `statistics` z biblioteki standardowej Python
# 5. Uruchom doctesty - wszystkie muszą się powieść

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> result  # doctest: +NORMALIZE_WHITESPACE
{'virginica': {'sepal_length': {'values': [5.8, 6.3, 7.6, 4.9, 7.1, 6.5, 6.3],
                                'mean': 6.357142857142857,
                                'median': 6.3,
                                'stdev': 0.871506631944823,
                                'variance': 0.7595238095238092},
               'sepal_width': {'values': [2.7, 2.9, 3.0, 2.5, 3.0, 3.0, 3.3],
                               'mean': 2.914285714285714,
                               'median': 3.0,
                               'stdev': 0.25448360411214066,
                               'variance': 0.06476190476190473},
               'petal_length': {'values': [5.1, 5.6, 6.6, 4.5, 5.9, 5.8, 6.0],
                                'mean': 5.642857142857142,
                                'median': 5.8,
                                'stdev': 0.6754187413675136,
                                'variance': 0.45619047619047615},
               'petal_width': {'values': [1.9, 1.8, 2.1, 1.7, 2.1, 2.2, 2.5],
                               'mean': 2.0428571428571427,
                               'median': 2.1,
                               'stdev': 0.26992062325273125,
                               'variance': 0.07285714285714287}},
 'setosa': {'sepal_length': {'values': [5.1, 4.7, 4.9, 4.6, 5.4, 5.0, 4.6],
                             'mean': 4.9,
                             'median': 4.9,
                             'stdev': 0.2943920288775951,
                             'variance': 0.08666666666666677},
            'sepal_width': {'values': [3.5, 3.2, 3.0, 3.4, 3.9, 3.6, 3.1],
                            'mean': 3.3857142857142857,
                            'median': 3.4,
                            'stdev': 0.31320159337914943,
                            'variance': 0.09809523809523807},
            'petal_length': {'values': [1.4, 1.3, 1.4, 1.4, 1.7, 1.4, 1.5],
                             'mean': 1.4428571428571428,
                             'median': 1.4,
                             'stdev': 0.12724180205607036,
                             'variance': 0.01619047619047619},
            'petal_width': {'values': [0.2, 0.2, 0.2, 0.3, 0.4, 0.3, 0.2],
                            'mean': 0.2571428571428572,
                            'median': 0.2,
                            'stdev': 0.07867957924694431,
                            'variance': 0.006190476190476191}},
   'versicolor': {'sepal_length': {'values': [5.7, 6.4, 7.0, 5.7, 5.5, 6.5, 6.9],
                                   'mean': 6.242857142857143,
                                   'median': 6.4,
                                   'stdev': 0.6106202935189289,
                                   'variance': 0.3728571428571429},
                  'sepal_width': {'values': [2.8, 3.2, 3.2, 2.8, 2.3, 2.8, 3.1],
                                  'mean': 2.8857142857142857,
                                  'median': 2.8,
                                  'stdev': 0.31847852585154235,
                                  'variance': 0.10142857142857152},
                  'petal_length': {'values': [4.1, 4.5, 4.7, 4.5, 4.0, 4.6, 4.9],
                                   'mean': 4.4714285714285715,
                                   'median': 4.5,
                                   'stdev': 0.31997023671109237,
                                   'variance': 0.10238095238095248},
                  'petal_width': {'values': [1.3, 1.5, 1.4, 1.3, 1.3, 1.5, 1.5],
                                  'mean': 1.4,
                                  'median': 1.4,
                                  'stdev': 0.09999999999999998,
                                  'variance': 0.009999999999999995}}}
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% Imports
from statistics import mean, stdev, median, variance

# %% Types
result: dict[str, dict]

# %% Data
DATA = [
    ('sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

# %% Result
result = ...