8.15. ORM About

  • ORM - Object Relational Mapping

  • Process of associating object oriented classes with database tables

  • Set of object oriented classes is a domain model (business model)

  • The most basic task is to translate between domain object and a table row

  • Any tool which takes database row and converts this to an object is an ORM

  • ORM represents basic compositions: one-to-many, many-to-one using foreign key

  • ORM allows to querying the database in terms of the domain model structure

  • Some ORM can represent class inheritance hierarchies using variety of schemes

  • Some ORMs can handle 'sharding' of data, i.e. storing a domain model across multiple schemas or databases

  • Provide various patterns for concurrency, including row versioning

  • Provide patterns for data validation and coercion

Source: [1]

../../_images/sqlalchemy-onion.png
../../_images/orm-schematics.png
../../_images/orm-roles.png

8.15.1. Flavors

  • Active Record or Data Mapper

  • Declarative or Imperative style configuration

8.15.2. Active Record

  • Active Record has domain objects handle their own persistence

  • Every object is a row in a table

  • Notion of objects working in a transaction is a secondary notion

Create an object:

>>> 
... mark = Astronaut(firstname='Mark', lastname='Watney')
... mark.save()

Usage:

>>> 
... mark = User.query(firstname='Mark', lastname='Watney').fetch()
... mark.firstname = 'Melissa'
... mark.lastname = 'Lewis'
... mark.save()

8.15.3. Data Mapper

  • Tries to keep the details of persistence separate from the object being persisted

  • It is more explicit

  • Does not use globals and threaded locals

  • Always create connection, transaction and explicitly say when to commit

Create an object:

>>> 
... with Session.begin() as session:
...     mark = Astronaut(firstname='Mark', lastname='Watney')
...     session.add(astro)

Usage:

>>> 
... query = (
...     select(Astronaut).
...     where(Astronaut.firstname == 'Mark').
...     where(Astronaut.lastname == 'Watney').
...     scalars().
...     first())
...
... with Session.begin() as session:
...     astro = session.execute()
...     astro.firstname = 'Melissa'
...     astro.lastname = 'Lewis'

8.15.4. Declarative Style Configuration

  • Classes and attributes

  • Attributes names columns in a database

ORMs may also provide different configuration patterns. Most use an 'all-at-once' style where class and table information is together. SQLAlchemy calls this declarative style [1].

>>> 
... class Astronaut(Base):
...     __tablename__ = 'astronaut'
...     id = Column(Integer, primary_key=True)
...     firstname = Column(String(length=100))
...     lastname = Column(String(length=100))
...
...
... class Mission(Base):
...     __tablename__ = 'mission'
...     id = Column(Integer, primary_key=True)
...     astronaut_id = Column(ForeignKey('astronaut.id'))
...     year = Column(Integer, nullable=False)
...     name = Column(String(length=50), nullable=False)

8.15.5. Imperative Style Configuration

  • There was a plan to remove Imperative Style from SQLAlchemy 2.0, but stayed

  • The class is not completely agnostic, because mapper heavily influence design

This other way is to keep the declaration of domain model and table metadata separate. SQLAlchemy calls this imperative style [1].

Class is declared without any awareness of database:

>>> 
... class Astronaut:
...     def __init__(self, firstname, lastname):
...         self.firstname = firstname
...         self.lastname = lastname

Then it is associated with a database table:

>>> 
... registry.mapper(
...     Astronaut,
...     Table('astronaut', metadata,
...         Column('id', Integer, primary_key=True),
...         Column('firstname', String(50)),
...         Column('lastname', String(50)),
...     )
... )

8.15.6. SQLAlchemy ORM

  • SQLAlchemy ORM is essentially a data mapper style ORM

  • Most users use declarative configuration style

  • Imperative style and a range of variants in between are supported as well

  • Extends SQLAlchemy Core, in particular extending the SQL Expression language

  • Designed to work with domain classes as well as table constructs

  • Key features: Unit of Work, Identity Map, Lazy / Eager loading

  • Unit of Work - accumulates INSERT/UPDATE/DELETE statements and transparently sends it to the database in batch

  • Identity Map - objects are kept unique in memory based on their primary key identity

  • Lazy / Eager loading - related attributes and collections can be loaded either on-demand (lazy) or upfront (eager)

  • Source: [1]

8.15.7. ORM

  • SQLAlchemy mappings in 1.4/2.0 start with a central object known as 'registry'

  • Has a collection of metadata inside it

  • Traditional Declarative Base uses Python metaclass

  • This gets in a way, when you want to uses metaclass on your own

  • In such case you can use mapper registry decorator

Using the registry, we can map classes in various ways, below illustrated using its 'mapped' decorator. In this form, we arrange class attributes in terms of Column objects to be mapped to a Table, which is named based on attribute __tablename__ [1].

First create an instance of a Mapper Registry object:

>>> from sqlalchemy.orm import registry
>>>
>>> mapper_registry = registry()

The Mapper object mediates the relationship between model and a Table object. This mapper is generally behind the scence and accessible.

Then specify the class using mapper registry decorator:

>>> from sqlalchemy import Column, Integer, String
>>> from sqlalchemy.orm import registry
>>>
>>>
>>> Models = registry()
>>>
>>> @Models.mapped
... class Astronaut:
...     __tablename__ = 'astronaut'
...     id = Column(Integer, primary_key=True)
...     firstname = Column(String(length=100))
...     lastname = Column(String(length=100))
...
...     def __repr__(self):
...         firstname = self.firstname
...         lastname = self.lastname
...         return f'Astronaut({firstname=}, {lastname=})'

The Astronaut class has now a Table object associated with it.

>>> Astronaut.__table__
Table('astronaut', MetaData(), Column('id', Integer(), table=<astronaut>, primary_key=True, nullable=False), Column('firstname', String(length=100), table=<astronaut>), Column('lastname', String(length=100), table=<astronaut>), schema=None)
>>> from sqlalchemy import select
>>>
>>>
>>> query = select(Astronaut)
>>>
>>> print(query)
SELECT astronaut.id, astronaut.firstname, astronaut.lastname
FROM astronaut

If you do not specify the constructor, it will be automatically generated for you based on the attributes (id, firstname, lastname) making them an optional keyword parameters. All parameters are optional, because some of them can be autogenerated, for example: id [1].

>>> astro = Astronaut(firstname='Mark', lastname='Watney')
>>> astro
Astronaut(firstname='Mark', lastname='Watney')

Using our registry (Models), we can create a database schema for this class using a MetaData object that is path of the registry:

>>> from sqlalchemy import create_engine
>>>
>>>
>>> engine = create_engine('sqlite:///:memory:')
>>>
>>> with engine.begin() as db:
...     Models.metadata.create_all(db)

To persists and load Astronaut objects from the database, we use a Session object, illustrated here from a factory called sessionmaker. The Session objects makes use of a connection factory (i.e. an Engine) and will handle the job of connecting, committing and releasing connections to this engine. Flag future=True in SQLAlchemy 1.4 will turn on 2.0 compatibility mode. This behavior will be default in 2.0 and flag will be deprecated.

>>> from sqlalchemy.orm import sessionmaker
>>>
>>>
>>> Session = sessionmaker(bind=engine, future=True)
>>> session = Session()

Creating a session does not implies connection. This is done lazily and will simply create an object and do nothing. Sessions will always delay database connection to the last possible moment, but it will also ensure that this will eventually happen.

8.15.8. Object Statuses

  • Transient - object created, but not yet added to the session

  • Pending - object added to a session but not yet stored in database

  • Persistent - represent an active row in a database (object is stored)

  • Detached

  • Pending Delete

8.15.9. Adding Objects

Let's create an transient object (object not yet added to the session):

>>> mark = Astronaut(firstname='Mark', lastname='Watney')

New objects are placed into the Session using add()

>>> session.add(mark)

This did not modify the database, however the object is now known as 'pending'. We can see the 'pending' objects by looking at the session.new attribute.

>>> session.new
IdentitySet([Astronaut(firstname='Mark', lastname='Watney')])

We can now query for this 'pending' row, by emitting a SELECT statement that will refer to Astronaut entities. This will first autoflush the pending changes, then SELECT the row we requested.

>>> from sqlalchemy import select
>>>
>>>
>>> query = (
...     select(Astronaut).
...     where(firstname=='Mark'))
>>>
>>> result = session.execute(query)

Session will autoflush before making queries, that is it will store all the pending objects before querying it. Session will delay this to the last possible moment. You can turn this behavior off by specifying a keyword argument autoflush=False to the sessionmaker factory.

We can get the data back from the result, in this case using the .scalar() method which will return the first column of the first row.

>>> mark = result.scalar()
>>> mark
Astronaut(firstname='Mark', lastname='Watney')

The Astronaut object we've inserted now has a value for .id attribute.

>>> mark.id
1

The Session maintains a 'unique' object per identity. So astro and mark are the same object.

>>> mark is astro
True

Identity Map - if you query the database table for the object with for example id==1 you will get the same object every time, as long as this object is in the memory. We can look at it on the Session.

>>> session.identity_map.items()
[((__main__.Astronaut, (1,), None), Astronaut(firstname='Mark', lastname='Watney'))]

8.15.10. Making Changes

  • Add more objects to be pending for flush

  • .add_all() is the same as .add(), but adds a list of objects

>>> session.add_all([
...     Astronaut(firstname='Melissa', lastname='Lewis'),
...     Astronaut(firstname='Rick', lastname='Martinez'),
... ])

Modify astro - the object is now marked as 'dirty'

>>> astro.firstname = 'Alex'
>>> astro.lastname = 'Vogel'

Nothing changed and no actions were performed to the database yet. If you inspect database current transactions you will have an open transaction process currently in progress.

The Session can us which objects are dirty:

>>> session.dirty
IdentitySet([Astronaut(firstname='Alex', lastname='Vogel')])

And can also tell us which objects are pending:

>>> session.new
IdentitySet([Astronaut(firstname='Melissa', lastname='Lewis'), Astronaut(firstname='Rick', lastname='Martinez')])

The whole transaction is committed. Commit always triggers a final flush of remaining changes. Commit will expire objects. This is due to the fact, that as soon as data is out there (in database), some other transactions could have already change the data. You can change this behavior by setting the expire_on_commit=False parameter to the sessionmaker factory.

>>> session.commit()

After a commit, there's no transaction. The Session 'invalidates' all data, so that accessing them will automatically start a 'new' transaction and re-load from the database. This is our first example of the ORM 'lazy loading' pattern.

>>> astro.firstname

8.15.11. Rolling Back Changes

Make another 'dirty' change, and another 'pending' change, that we might change or minds about.

>>> astro.firstname = 'Beth'
>>> astro.lastname = 'Johanssen'
>>>
>>> chris = Astronaut(firstname='Chris', lastname='Beck')
>>> session.add(chris)

Run a query, our changes are flushed; results come back.

>>> query = (
...     select(Astronaut).
...     where(Astronaut.firstname.in_(['Beth', 'Chris'])))
>>>
>>> result = session.execute(query)
>>> result.all()

Those changes are not yet in the database. The transaction was not committed yet. Therefore if your database will be restarted you will loose those information, unless non-default transaction durability options are set in the database configuration.

But we're inside of a transaction. Roll it back:

>>> session.rollback()

All updates and inserts are gone, and all pending objects are evicted. Again, the transaction is over, objects are expired. Accessing an attribute refreshes the object and the astro firstname is gone [1].

>>> astro in session
False

And the data is gone from database too.

>>> query = (
...     select(Astronaut).
...     where(Astronaut.firstname.in_(['Beth', 'Chris'])))
>>>
>>> result = session.execute(query)
>>> result.all()
[]

8.15.12. ORM Querying

The attributes on our mapped classes act like Column objects, and produce SQL expressions [1].

>>> expression = (Astronaut.firstname == 'Mark')
>>>
>>> print(expression)
astronaut.firstname = :firstname_1
>>> expression = Astronaut.__table__.c.firstname == 'Mark'
>>>
>>> print(expression)
astronaut.firstname = :firstname_1

Fot the above example, although output is similar, they produce a different objects.

When ORM-specific expressions are used with select(), the Select construct itself takes an ORM-enabled features, the most basic of which is that it can discern between selecting from 'columns' vs 'entities'. Below the SELECT is to return rows that contain a single element, which would be an instance of Astronaut. This is translated from the actual SELECT sent to the database that SELECTs for the individual columns of the Astronaut entity [1].

>>> query = (
...     select(Astronaut).
...     where(Astronaut.firstname == 'Mark').
...     order_by(Astronaut.id))

Introspection:

>>> query._raw_columns[0]
Table('astronaut', MetaData(), Column('id', Integer(), table=<astronaut>, primary_key=True, nullable=False), Column('firstname', String(length=100), table=<astronaut>), Column('lastname', String(length=100), table=<astronaut>), schema=None)
>>>
>>> query._raw_columns[0]._annotations  
immutabledict({'entity_namespace': <Mapper at 0x11bc942b0; Astronaut>, 'parententity': <Mapper at 0x...; Astronaut>, 'parentmapper': <Mapper at 0x...; Astronaut>})

The rows we get back from Session.execute() then contain Astronaut objects as the first element in each row [1].

>>> result = session.execute(query)
>>>
>>> for row in result:
...     print(row)
...
(Astronaut(firstname='Mark', lastname='Watney'),)

As it is typically convenient for rows that only have a single element to be delivered as the element alone, we can use the .scalars() method of Result as we did earlier to return just the first column of each row [1].

>>> result = session.execute(query)
>>>
>>> for row in result.scalars():
...     print(row)
...
Astronaut(firstname='Mark', lastname='Watney')

We can also qualify the rows we want to get back with methods like .one() [1]:

>>> result = session.execute(query)
>>> mark = result.scalars().one()
>>>
>>> print(astro)
Astronaut(firstname='Mark', lastname='Watney')

An ORM query can make use of any combination of columns and entities. To request the fields of Astronaut separately, we name them separately in the columns clause [1].

>>> query = select(Astronaut.firstname, Astronaut.lastname)
>>> result = session.execute(query)
>>>
>>> for row in result:
...     print(f'{row.firstname}, {row.lastname}')
...
Mark, Watney
Melissa, Lewis
Rick, Martinez
>>> query = select(Astronaut.firstname, Astronaut.lastname)
>>> result = session.execute(query)
>>>
>>> for firstname, lastname in result:
...     print(f'{firstname=}, {lastname=}')
...
firstname='Mark', lastname='Watney'
firstname='Melissa', lastname='Lewis'
firstname='Rick', lastname='Martinez'

You can combine 'entities' and columns together:

>>> query = select(Astronaut, Astronaut.firstname)
>>> result = session.execute(query)
>>>
>>> for row in result:
...     print(f'{row.Astronaut.id}, {row.firstname}, {row.Astronaut.lastname}')
...
1, Mark, Watney
2, Melissa, Lewis
3, Rick, Martinez

The WHERE clause is either by .filter_by(), which is convenient:

>>> query = (
...     select(Astronaut.firstname, Astronaut.lastname).
...     filter_by(firstname='Mark'))
>>>
>>> result = session.execute(query)
>>>
>>> for firstname, lastname in result:
...     print(f'{firstname=}, {lastname=}')
...
firstname='Mark', lastname='Watney'

Or where() for more explicitness:

>>> query = (
...     select(Astronaut).
...     where(Astronaut.firstname == 'Mark').
...     where(Astronaut.lastname == 'Watney'))
>>>
>>> result = session.execute(query)
>>>
>>> for row in result.scalars():
...     print(f'{firstname=}, {lastname=}')
...
firstname='Mark', lastname='Watney'

8.15.13. Relationships, Joins

Start with the same mapping as before. Except we will also give it a one-to-many relationship to a second entity.

>>> from sqlalchemy import ForeignKey, Column, Integer, String
>>> from sqlalchemy.orm import registry, relationship
>>>
>>>
>>> Models = registry()
>>>
>>> @Models.mapped
... class Astronaut:
...     __tablename__ = 'astronaut'
...     id = Column(Integer, primary_key=True)
...     firstname = Column(String(length=100))
...     lastname = Column(String(length=100))
...     missions = relationship('Mission', back_populates='astronaut')
...
...     def __repr__(self):
...         firstname = self.firstname
...         lastname = self.lastname
...         return f'Astronaut({firstname=}, {lastname=})'
>>>
>>>
>>> @Models.mapped
... class Mission:
...     __tablename__ = 'mission'
...     id = Column(Integer, primary_key=True)
...     astronaut_id = Column(ForeignKey('astronaut.id'))
...     year = Column(Integer, nullable=False)
...     name = Column(String(length=50), nullable=False)
...     astronaut = relationship('Astronaut', back_populates='missions')
...
...     def __repr__(self):
...         year = self.year
...         name = self.name
...         return f'Mission({year=}, {name=})'

For the other end of one-to-many, create another mapped class with a ForeignKey referring back to Astronaut. ForeignKey field is a SQLAlchemy core's thing and relationship field is for ORM's. Note, that it is not needed to specify type of the relationship (one-to-many, many-to-one, or many-to-many) as of relationship() will infer this by the column type (ForeignKey -> one-to-many) [1].

Create tables

>>> from sqlalchemy import create_engine
>>>
>>>
>>> engine = create_engine('sqlite:///:memory:')
>>>
>>> with engine.begin() as db:
...     Models.metadata.create_all(db)

Will produce:

BEGIN
PRAGMA main.table_info("astronaut")
PRAGMA temp.table_info("astronaut")
PRAGMA main.table_info("mission")
PRAGMA temp.table_info("mission")

CREATE TABLE astronaut (
    id INTEGER NOT NULL,
    firstname VARCHAR(100),
    lastname VARCHAR(100),
    PRIMARY KEY (id)
)

CREATE TABLE mission (
    id INTEGER NOT NULL,
    astronaut_id INTEGER,
    year INTEGER NOT NULL,
    name VARCHAR(50) NOT NULL,
    PRIMARY KEY (id),
    FOREIGN KEY(astronaut_id) REFERENCES astronaut (id)
)
COMMIT

Insert data in the Astronaut table. Here we illustrate the sessionmaker factory as a transactional context manager [1]:

>>> from sqlalchemy.orm import sessionmaker
>>>
>>>
>>> Session = sessionmaker(bind=engine, future=True)
>>>
>>> with Session.begin() as  session:
...     session.add_all([
...         Astronaut(firstname='Mark', lastname='Watney'),
...         Astronaut(firstname='Melissa', lastname='Lewis'),
...         Astronaut(firstname='Rick', lastname='Martinez'),
...     ])

1.4/2.0 tries to make more consistent. Session.begin() is analogous to Engine.begin(). Sessionmaker is analogous to core engine. And the session itself is analogous to core connection.

A new Astronaut object also gains an empty missions collection now.

>>> alex = Astronaut(firstname='Alex', lastname='Vogel')
>>> alex.missions
[]

Populate this collection with new Address objects.

>>> alex.missions = [
...     Mission(year=2030, name='Ares1'),
...     Mission(year=2035, name='Ares3'),
... ]

'Back populates' sets up Mission.astronaut for each Astronaut.mission

>>> alex
Astronaut(firstname='Alex', lastname='Vogel')
>>>
>>> alex.missions
[Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]
>>>
>>> alex.missions[0]
Mission(year=2030, name='Ares1')
>>>
>>> alex.missions[0].astronaut
Astronaut(firstname='Alex', lastname='Vogel')

You can specify the relation only in one way, but usually people will do it both-ways for easy of use.

Adding alex will 'cascade' each Astronaut into the Session as well.

>>> session = Session()
>>> session.add(alex)
>>> session.new
IdentitySet([Astronaut(firstname='Alex', lastname='Vogel'), Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')])

Now we commit the changes to the database.

>>> session.commit()

ORM must know which object goes first, and then it uses its id to fill the ForeignKey fields of related objects. SQLAlchemy does that automatically.

After expiration, alex.missions emits a 'lazy load' when first accessed:

>>> alex.missions
[Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]

The collection stays in memory until the transaction ends.

>>> alex.missions
[Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]

Collections and references are updated by manipulating objects themselves; setting up of foreign key column values is handled automatically.

>>> from sqlalchemy import select
>>>
>>>
>>> query = (
...     select(Astronaut).
...     filter_by(firstname='Mark'))
>>>
>>> mark = session.execute(query).scalar_one()
>>> alex.missions
[Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]
>>>
>>> mark.missions
[]
>>> alex.missions[1].astronaut = mark
>>>
>>> alex.missions
[Mission(year=2030, name='Ares1')]
>>>
>>> mark.missions
[Mission(year=2035, name='Ares3')]

By assigning .astronaut on one of the alex missions, the object moved from one missions collection to the other. This is the back populates feature at work.

8.15.14. Querying with Multiple Tables

A SELECT statement can select from multiple entities simultaneously.

>>> query = (
...     select(Astronaut, Mission).
...     where(Astronaut.id == Mission.astronaut_id))
>>>
>>> result = session.execute(query)
>>>
>>> for row in result:
...     print(row)
...
(Astronaut(firstname='Alex', lastname='Vogel'), Mission(year=2030, name='Ares1'))
(Astronaut(firstname='Mark', lastname='Watney'), Mission(year=2035, name='Ares3'))

Or unpack the results. We know that there will be two objects in a tuple because we did select(Astronaut, Mission).

>>> query = (
...     select(Astronaut, Mission).
...     where(Astronaut.id == Mission.astronaut_id))
>>>
>>> result = session.execute(query)
>>>
>>> for astronaut, mission in result:
...     print(f'{astronaut=}, {mission=}')
...
astronaut=Astronaut(firstname='Alex', lastname='Vogel'), mission=Mission(year=2030, name='Ares1')
astronaut=Astronaut(firstname='Mark', lastname='Watney'), mission=Mission(year=2035, name='Ares3')

As is the same case in Core, we use the select().join() method to create joins. An entity can be given as the target which will join along foreign keys.

>>> query = (
...     select(Astronaut, Mission).
...     join(Mission))
>>>
>>> result = session.execute(query)
>>>
>>> result.all()  
[(Astronaut(firstname='Alex', lastname='Vogel'), Mission(year=2030, name='Ares1')),
 (Astronaut(firstname='Mark', lastname='Watney'), Mission(year=2035, name='Ares3'))]

Or you can give it an explicit SQL expression for the ON clause.

>>> query = (
...     select(Astronaut, Mission).
...     join(Mission, Astronaut.id == Mission.astronaut_id))
>>>
>>> result = session.execute(query)
>>>
>>> result.all()  
[(Astronaut(firstname='Alex', lastname='Vogel'), Mission(year=2030, name='Ares1')),
 (Astronaut(firstname='Mark', lastname='Watney'), Mission(year=2035, name='Ares3'))]

However the most accurate and succinct way is to use the relationship-bound attribute.

>>> query = (
...     select(Astronaut, Mission).
...     join(Astronaut.missions))
>>>
>>> result = session.execute(query)
>>>
>>> result.all()  
[(Astronaut(firstname='Alex', lastname='Vogel'), Mission(year=2030, name='Ares1')),
 (Astronaut(firstname='Mark', lastname='Watney'), Mission(year=2035, name='Ares3'))]

All three methods should result the same data.

Note, that join(Astronaut.missions) is only available in ORM, because missions attributes is an ORM relationship.

>>> print(query)
SELECT astronaut.id, astronaut.firstname, astronaut.lastname, mission.id AS id_1, mission.astronaut_id, mission.year, mission.name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id

The ORM version of table.alias() is to use the aliased() function on mapped entity.

>>> from sqlalchemy.orm import aliased
>>>
>>>
>>> m1 = aliased(Mission)
>>> m2 = aliased(Mission)
>>>
>>> query = (
...     select(Astronaut).
...     join_from(Astronaut, m1).
...     join_from(Astronaut, m2).
...     where(m1.name == 'Ares1').
...     where(m2.name == 'Ares3'))
>>>
>>> result = session.execute(query)
>>> result.all()
[(Astronaut(firstname='Alex', lastname='Vogel'),)]
>>> print(query)
SELECT astronaut.id, astronaut.firstname, astronaut.lastname
FROM astronaut JOIN mission AS mission_1 ON astronaut.id = mission_1.astronaut_id JOIN mission AS mission_2 ON astronaut.id = mission_2.astronaut_id
WHERE mission_1.name = :name_1 AND mission_2.name = :name_2

To join() to an aliased() object with more specificity, a form such Class.relationship.of_type(aliased) may be used:

>>> from sqlalchemy.orm import aliased
>>>
>>>
>>> m1 = aliased(Mission)
>>> m2 = aliased(Mission)
>>>
>>> query = (
...     select(Astronaut).
...     join(Astronaut.missions.of_type(m1)).
...     join(Astronaut.missions.of_type(m2)).
...     where(m1.name == 'Ares1').
...     where(m2.name == 'Ares3'))
>>>
>>> result = session.execute(query)
>>> result.all()
[(Astronaut(firstname='Alex', lastname='Vogel'),)]

Useful for querying objects which has special conditions, such as: is_deleted=False flag, or newer than particular date.

As was the case with Core, we can use subqueries and joins with ORM mapped classes as well.

>>> from sqlalchemy import func
>>>
>>>
>>> subquery = (
...     select(func.count(Mission.id).label('count'), Mission.astronaut_id).
...     group_by(Mission.astronaut_id)
...     subquery())
>>>
>>> query = (
...     select(Astronaut.firstname, func.coalesce(subquery.c.count, 0)).
...     outerjoin(subquery, Astronaut.id == subquery.c.astronaut_id))
>>>
>>> result = session.execute(query)
>>> result.all()
[('Mark', 1), ('Melissa', 0), ('Rick', 0), ('Alex', 1)]

CTEs works the same way too.

8.15.15. Eager Loading

The N plus one problem is an ORM issue which refers to the many SELECT statements emitted when loading collections against a parent result. As SQLAlchemy is a full featured ORM it has the same problem. This is the biggest and the most famous problem of the ORM.

Lazy loaded N+one prone code:

>>> query = select(Astronaut)
>>>
>>> with Session() as session:
...     result = session.execute(query)
...     for astronaut in result.scalars():
...         print(astronaut, astronaut.missions)
...
Astronaut(firstname='Mark', lastname='Watney') []
Astronaut(firstname='Melissa', lastname='Lewis') []
Astronaut(firstname='Rick', lastname='Martinez') []
Astronaut(firstname='Alex', lastname='Vogel') [Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]

However, SQLAlchemy was designed from the start to tame the 'N plus one' problem by implementing 'eager loading'. Eager loading is now very mature, and the most effective strategy for collections is currently the selectinload option:

>>> from sqlalchemy.orm import selectinload
>>>
>>>
>>> query = (
...     select(Astronaut).
...     options(selectinload(Astronaut.missions)))
>>>
>>> with Session() as session:
...     result = session.execute(query)
...     for astronaut in result.scalars():
...         print(astronaut, astronaut.missions)
...
Astronaut(firstname='Mark', lastname='Watney') []
Astronaut(firstname='Melissa', lastname='Lewis') []
Astronaut(firstname='Rick', lastname='Martinez') []
Astronaut(firstname='Alex', lastname='Vogel') [Mission(year=2030, name='Ares1'), Mission(year=2035, name='Ares3')]

The oldest eager loading strategy is joinedload(). This uses LEFT OUTER JOIN or INNER JOIN to load parent + child on one query. joinedload() can work for collections as well, however it is best tailored towards many-to-one relationships, particularly those where the foreign key is NOT NULL.

>>> from sqlalchemy.orm import joinedload
>>>
>>>
>>> query = (
...     select(Mission).
...     options(joinedload(Mission.astronaut, innerjoin=True)))
>>>
>>> with Session() as session:
...     result = session.execute(query)
...     for mission in result.scalars():
...         print(mission, mission.astronaut.firstname)
...
Mission(year=2030, name='Ares1') Alex
Mission(year=2035, name='Ares3') Alex

Note, Eager loading 'does not' change the result of the Query. Only how related collections are loaded. An explicit join() can be mixed with the joinedload() and they are kept separate.

>>> from sqlalchemy.orm import joinedload
>>>
>>>
>>> query = (
...     select(Mission).
...     join(Mission.astronaut).
...     where(Astronaut.firstname == 'Alex').
...     options(joinedload(Mission.astronaut)))
>>>
>>> with Session() as session:
...     result = session.execute(query)
...     for mission in result.scalars():
...         print(mission, mission.astronaut)
Mission(year=2030, name='Ares1') Astronaut(firstname='Alex', lastname='Vogel')
Mission(year=2035, name='Ares3') Astronaut(firstname='Alex', lastname='Vogel')

To optimize the common case of 'join to many-to-one and also load it on the object', the contains_eager() option is used

>>> from sqlalchemy.orm import contains_eager
>>>
>>>
>>> query = (
...     select(Mission).
...     join(Mission.astronaut).
...     where(Astronaut.firstname == 'Alex').
...     options(contains_eager(Mission.astronaut)))
>>>
>>> with Session() as session:
...     result = session.execute(query)
...     for mission in result.scalars():
...         print(mission, mission.astronaut)
Mission(year=2030, name='Ares1') Astronaut(firstname='Alex', lastname='Vogel')
Mission(year=2035, name='Ares3') Astronaut(firstname='Alex', lastname='Vogel')

8.15.16. References