3.4. To Pickle

  • File paths works also with DATAs

3.4.1. SetUp

>>> import pandas as pd
>>>
>>>
>>> df = pd.DataFrame([
...     {'firstname': 'Mark',    'lastname': 'Watney',   'role': 'botanist'},
...     {'firstname': 'Melissa', 'lastname': 'Lewis',    'role': 'commander'},
...     {'firstname': 'Rick',    'lastname': 'Martinez', 'role': 'pilot'},
... ])
>>>
>>> df
  firstname  lastname       role
0      Mark    Watney   botanist
1   Melissa     Lewis  commander
2      Rick  Martinez      pilot

3.4.2. Example

>>> df.to_pickle('/tmp/myfile.pkl')
$ cat /tmp/myfile.pkl
����pandas.core.frame��    DataFrame���)��}�(�_mgr��
pandas.core.internals.managers�� BlockManager����pandas._libs.internals��_unpickle_block����numpy._core.multiarray�� _reconstruct����numpy��ndarray���K��Cb���R�(KKK��h�dtype����O8�����R�(K�|�NNNJ����J����K?t�b�]�(�Mark��Melissa��Rick��Watney��Lewis�Martinez�botanist�� commander��pilot�et�bbuiltins��slice���KKK��R�K��R���]�(�pandas.core.indexes.base��

_new_Index���h2�Index���}�(�data�hhK��h��R�(KK��h�]�(� firstname�lastname��role�et�b�name�Nu��R�h4�pandas.core.indexes.range�� RangeIndex���}�(hBN�start�K�stop�K�step�Ku��R�e��R��_typ�� dataframe�� _metadata�]��attrs�}��_flags�}��allows_duplicate_labels��sub.

3.4.3. Assignments

# FIXME: Zrobić jakiś dataframe z prostszymi danymi bez .head() i .tail(). np. martian. W danych powinny być: str, int, bool, float, datetime oraz None

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -v myfile.py`

# %% About
# - Name: DataFrame Export Pickle
# - Difficulty: easy
# - Lines: 3
# - Minutes: 3

# %% English
# 1. Read data from `DATA` as `result: pd.DataFrame`
# 2. While reading use `header=0` parameter
# 3. Select 146 head rows, and last 11 from it
# 4. Export data from column `Event` to file the `FILE`
# 5. Data has to be in Pickle format
# 6. Run doctests - all must succeed

# %% Polish
# 1. Wczytaj dane z `DATA` jako `result: pd.DataFrame`
# 2. Przy wczytywaniu użyj parametru `header=0`
# 3. Wybierz pierwszych 146 wierszy, a z nich ostatnie 11
# 4. Wyeksportuj dane z kolumny `Event` do pliku `FILE`
# 5. Dane mają być w formacie Pickle
# 6. Uruchom doctesty - wszystkie muszą się powieść

# %% Tests
"""
>>> import sys; sys.tracebacklimit = 0
>>> from os import remove
>>> result = pd.read_pickle(FILE)
>>> remove(FILE)

>>> pd.set_option('display.width', 500)
>>> pd.set_option('display.max_columns', 10)
>>> pd.set_option('display.max_rows', 20)

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'
>>> assert type(result) is pd.Series, \
'Variable `result` has invalid type, should be `pd.Series`'

>>> result
135                                    LM lunar landing.
136                   LM powered descent  engine cutoff.
137    Decision made to  proceed with EVA prior to fi...
138                        Preparation for EVA  started.
139                           EVA started (hatch  open).
140                 CDR completely outside  LM on porch.
141    Modular equipment  stowage assembly deployed (...
142                    First clear TV picture  received.
143    CDR at foot of ladder (starts to report, then ...
144    CDR at foot of ladder and described surface as...
145    1st step taken lunar surface (CDR). That's one...
Name: Event, dtype: object
"""

import pandas as pd

DATA = 'https://python3.info/_static/apollo11.html'
FILE = r'_temporary.pkl'

# Dump DATA to FILE in Pickle format
# type: pd.DataFrame
result = ...