3.2. ORM Use Cases

3.2.1. Base

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney'),
...     Astronaut('Melissa', 'Lewis'),
...     Astronaut('Rick', 'Martinez'),
... ]
../../_images/oop-relations-base1.png

3.2.2. Extend

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist'),
...     Astronaut('Melissa', 'Lewis', 'Commander'),
...     Astronaut('Rick', 'Martinez', 'Pilot'),
... ]
../../_images/oop-relations-extend11.png
>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
...     mission_year: int
...     missions_name: str
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist', 2035, 'Ares 3'),
...     Astronaut('Melissa', 'Lewis', 'Commander', 2035, 'Ares 3'),
...     Astronaut('Rick', 'Martinez', 'Pilot', 2035, 'Ares 3'),
... ]
../../_images/oop-relations-extend21.png

3.2.3. Boolean Vector

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Mission:
...     year: int
...     name: str
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
...     missions: list[Mission]
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist', missions=[
...         Mission(2035, 'Ares 3')]),
...     Astronaut('Melissa', 'Lewis', 'Commander', missions=[
...         Mission(2035, 'Ares 3'),
...         Mission(2031, 'Ares 1')]),
...     Astronaut('Rick', 'Martinez', 'Pilot', missions=[]),
... ]
../../_images/oop-relations-boolvector1.png

3.2.4. FFill

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Mission:
...     year: int
...     name: str
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
...     missions: list[Mission]
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist', missions=[
...         Mission(2035, 'Ares 3')]),
...     Astronaut('Melissa', 'Lewis', 'Commander', missions=[
...         Mission(2035, 'Ares 3'),
...         Mission(2031, 'Ares 1')]),
...     Astronaut('Rick', 'Martinez', 'Pilot', missions=[]),
... ]
../../_images/oop-relations-ffill-empty1.png
../../_images/oop-relations-ffill-dash1.png
../../_images/oop-relations-ffill-duplicate1.png
../../_images/oop-relations-ffill-uniqid1.png

3.2.5. Relations

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Mission:
...     year: int
...     name: str
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
...     missions: list[Mission]
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist', missions=[
...         Mission(2035, 'Ares 3')]),
...     Astronaut('Melissa', 'Lewis', 'Commander', missions=[
...         Mission(2035, 'Ares 3'),
...         Mission(2031, 'Ares 1')]),
...     Astronaut('Rick', 'Martinez', 'Pilot', missions=[]),
... ]
../../_images/oop-relations-rel-m2o1.png
../../_images/oop-relations-rel-m2m1.png

3.2.6. Serialization

>>> from dataclasses import dataclass
>>>
>>>
>>> @dataclass
... class Mission:
...     year: int
...     name: str
>>>
>>>
>>> @dataclass
... class Astronaut:
...     firstname: str
...     lastname: str
...     role: str
...     missions: list[Mission]
>>>
>>>
>>> CREW = [
...     Astronaut('Mark', 'Watney', 'Botanist', missions=[
...         Mission(2035, 'Ares 3')]),
...     Astronaut('Melissa', 'Lewis', 'Commander', missions=[
...         Mission(2035, 'Ares 3'),
...         Mission(2031, 'Ares 1')]),
...     Astronaut('Rick', 'Martinez', 'Pilot', missions=[]),
... ]
../../_images/oop-relations-serialize-cls1.png
../../_images/oop-relations-serialize-obj1.png
../../_images/oop-relations-serialize-objattr1.png
../../_images/oop-relations-serialize-clsattr1.png
../../_images/oop-relations-data-01.png

3.2.7. Recap

  • DBA and Programmers use different data format than Data Scientists

  • Data Scientists prefer flat formats, without relations and joins

  • DBA and Programmers prefer relational data

  • For DBA and Programmers flat data formats represents data duplication

  • Normalization make data manipulation more consistent

  • Normalization uses less space and makes UPDATEs easier

  • Normalization causes a lot of SELECT and JOINs, which requires computation

  • In XXI century storage is cheap, computing power cost money

  • Currently SELECTs are far more common than INSERTs and UPDATEs (let say 80%-15%-5% - just a rough estimate, please don't quote this number)

  • Normalization does not work at large (big-data) scale

  • Big data requires simplified approach, and typically without any relations

  • Data consistency then is achieved by business logic