8.12. SQLAlchemy Core Join

✘>>> from sqlalchemy import update

8.12.1. SetUp

✘>>> from sqlalchemy import create_engine, MetaData, Table, Column, ForeignKey
>>> from sqlalchemy import Integer, String, Date, Numeric, Enum
>>> from sqlalchemy import select
>>>
>>>
>>> engine = create_engine('sqlite:///:memory:', future=True)
>>> metadata = MetaData()
>>>
>>> astronaut = Table('astronaut', metadata,
...     Column('id', Integer, primary_key=True),
...     Column('firstname', String(50), nullable=False),
...     Column('lastname', String(50), nullable=False),
...     Column('agency', Enum('NASA', 'ESA', 'POLSA')),
...     Column('birthdate', Date),
...     Column('age', Integer),
...     Column('height', Numeric(3,2)),
...     Column('weight', Numeric(3,2)),
... )
>>>
>>> mission = Table('mission', metadata,  # one to many relationship
...     Column('id', Integer, primary_key=True),
...     Column('astronaut_id', ForeignKey('astronaut.id'), nullable=False),
...     Column('year', Integer, nullable=False),
...     Column('name', String(50), nullable=False),
... )
>>>
>>> ASTRONAUTS = [
...     {'firstname': 'Mark', 'lastname': 'Watney'},
...     {'firstname': 'Melissa', 'lastname': 'Lewis'},
...     {'firstname': 'Rick', 'lastname': 'Martinez'},
... ]
>>>
>>> MISSIONS = [
...     {'astronaut_id': 1, 'year': 2035, 'name': 'Ares3'},
...     {'astronaut_id': 2, 'year': 2030, 'name': 'Ares1'},
...     {'astronaut_id': 2, 'year': 2035, 'name': 'Ares3'},
...     {'astronaut_id': 3, 'year': 2035, 'name': 'Ares3'},
... ]
>>>
>>> with engine.begin() as db:
...     metadata.create_all(db)
...     result = db.execute(astronaut.insert(), ASTRONAUTS)
...     result = db.execute(mission.insert(), MISSIONS)

8.12.2. Select Multiple Tables

SELECT from more than one table
This will render CARTESIAN JOIN
This can crash systems if tables are big
Since SQLAlchemy 1.4 library will warn on cartesian products

The select() construct will include in the FROM clause all those tables that we mention in the columns clause or WHERE clause. By default, they are separated by a coma [1].

✘>>> query = select(astronaut.c.firstname, mission.c.name)
>>>
>>> print(query)
SELECT astronaut.firstname, mission.name
FROM astronaut, mission

Selecting from multiple tables without relating them to each other produces an effect known as 'cartesian product'. SQLAlchemy will usually warn when this is detected during statement execution [1].

✘>>> query = select(astronaut.c.firstname, mission.c.name)
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
... # SAWarning: SELECT statement has a cartesian product between FROM
... # element(s) "astronaut" and FROM element "mission".  Apply join
... # condition(s) between each element to resolve.
>>>
>>> result.all()
[('Mark', 'Ares3'),
 ('Mark', 'Ares1'),
 ('Mark', 'Ares3'),
 ('Mark', 'Ares3'),
 ('Melissa', 'Ares3'),
 ('Melissa', 'Ares1'),
 ('Melissa', 'Ares3'),
 ('Melissa', 'Ares3'),
 ('Rick', 'Ares3'),
 ('Rick', 'Ares1'),
 ('Rick', 'Ares3'),
 ('Rick', 'Ares3')]

Using warnings filter can turn Warning to Error. This is useful for a CI/CD and a pre-production tests.

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, mission.name
FROM astronaut, mission

8.12.3. Join From

New in SQLAlchemy 1.4
Have some additional features than join()
More explicitly
Is better to start chain of joins

When we have more than one table mentioned, we want to relate them together, which is most easily achieved using join_from() [1].

✘>>> query = (
...     select(astronaut.c.firstname, mission.c.name).
...     join_from(astronaut, mission)
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark', 'Ares3'),
 ('Melissa', 'Ares1'),
 ('Melissa', 'Ares3'),
 ('Rick', 'Ares3')]

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, mission.name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id

8.12.4. Join

join() will infer the left hand side automatically
Is better for continuing chain of joins

✘>>> query = (
...     select(astronaut.c.firstname, mission.c.name).
...     join(mission)
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark', 'Ares3'),
 ('Melissa', 'Ares1'),
 ('Melissa', 'Ares3'),
 ('Rick', 'Ares3')]

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, mission.name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id

8.12.5. Join On

You can specify the column on which to perform a join
Useful when there is several ForeignKey columns
If SQLAlchemy cannot find join column automatically it throws an error

The ON clause of the JOIN is also inferred automatically from the foreign key relationship of the involved tables. We may chose to express this join condition explicitly, as would be needed if the join condition were otherwise ambiguous [1].

✘>>> query = (
...     select(astronaut.c.firstname, mission.c.name).
...     join(mission, astronaut.c.id == mission.c.astronaut_id)
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark', 'Ares3'),
 ('Melissa', 'Ares1'),
 ('Melissa', 'Ares3'),
 ('Rick', 'Ares3')]

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, mission.name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id

8.12.6. Table Aliases

Python will use object identity to distinguish objects

When a SELECT wants to refer to the same table more than once, a SQL alias is used. This is available using the .alias() method, which returns a unique Alias object representing that table with a particular SQL alias [1].

✘>>> m1 = mission.alias()
>>> m2 = mission.alias()
>>>
>>> query = (
...     select(astronaut.c.firstname, m1.c.name, m2.c.name).
...     join_from(astronaut, m1).
...     join_from(astronaut, m2).
...     where(m1.c.name == 'Ares1').
...     where(m2.c.name == 'Ares3')
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Melissa', 'Ares1', 'Ares3')]

Note, using .join_from().

Use Case: When you want to get rows in two different context.

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, mission_1.name, mission_2.name AS name_1
FROM astronaut JOIN mission AS mission_1 ON astronaut.id = mission_1.astronaut_id JOIN mission AS mission_2 ON astronaut.id = mission_2.astronaut_id
WHERE mission_1.name = :name_2 AND mission_2.name = :name_3

8.12.7. Subqueries

A subquery is used much like a table alias, except we start with a SELECT statement. We call the .subquery method of select() [1].

The subquery object itself has .c attribute, and is used just like a table.

✘>>> subquery = (
...     select(astronaut.c.firstname, mission.c.name).
...     join(mission).
...     subquery()
... )
>>>
>>> query = (
...     select(subquery.c.firstname).
...     where(subquery.c.firstname == 'Mark')
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark',)]

Generated SQL statement:

✘>>> print(subquery)
SELECT astronaut.firstname, mission.name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id
>>>
>>> print(query)
SELECT anon_1.firstname
FROM (SELECT astronaut.firstname AS firstname, mission.name AS name
FROM astronaut JOIN mission ON astronaut.id = mission.astronaut_id) AS anon_1
WHERE anon_1.firstname = :firstname_1

8.12.8. Subqueries Group By

SQLAlchemy uses column correspondence
It uses column names to identify implicit foreign keys
Example: astronaut_id will be joined with astronaut.id

With subqueries and coins we can compose more elaborate statements. This subquery introduces the func and group_by connects [1]:

We use join() to link the subquery() with another select():

✘>>> from sqlalchemy import func
>>>
>>>
>>> subquery = (
...     select(mission.c.astronaut_id,
...            func.count(mission.c.id).label('count')).
...     group_by(mission.c.astronaut_id).
...     subquery()
... )
>>>
>>> query = (
...     select(astronaut.c.firstname, subquery.c.count).
...     join(subquery).
...     order_by(astronaut.c.firstname)
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark', 1), ('Melissa', 2), ('Rick', 1)]

Note, that while using function from a func namespace, you should add a label to it, because the function results doesn't have meaningful names.

Generated SQL statement:

✘>>> print(query)
SELECT astronaut.firstname, anon_1.count
FROM astronaut JOIN (SELECT mission.astronaut_id AS astronaut_id, count(mission.id) AS count
FROM mission GROUP BY mission.astronaut_id) AS anon_1 ON astronaut.id = anon_1.astronaut_id ORDER BY astronaut.firstname

8.12.9. Common Table Expressions

CTE - Common Table Expressions
Very popular PostgreSQL feature
Could be used with SELECT, UPDATE and DELETE
Like a Subquery, but not in the FROM clause
It resides above query and uses syntax WITH
Allow recursive queries
Can produce very optimized forms
Postgres can optimize CTE better than subqueries
In SQLAlchemy it is used exactly the same way as subqueries

Joining to a subquery can also be achieved using a CTE (Common Table Expression). By calling cte() instead of subquery(), we get a CTE [1]:

We SELECT/JOIN to the CTE in exactly the same way as we did the subquery:

✘>>> from sqlalchemy import func
>>>
>>>
>>> cte = (
...     select(mission.c.astronaut_id,
...            func.count(mission.c.id).label('count')).
...     group_by(mission.c.astronaut_id).
...     cte()
... )
>>>
>>> query = (
...     select(astronaut.c.firstname, cte.c.count).
...     join(cte).
...     order_by(astronaut.c.firstname)
... )
>>>
>>> with engine.begin() as db:
...     result = db.execute(query)
>>>
>>> result.all()
[('Mark', 1), ('Melissa', 2), ('Rick', 1)]

Generated SQL statement:

✘>>> print(query)
WITH anon_1 AS
(SELECT mission.astronaut_id AS astronaut_id, count(mission.id) AS count
FROM mission GROUP BY mission.astronaut_id)
 SELECT astronaut.firstname, anon_1.count
FROM astronaut JOIN anon_1 ON astronaut.id = anon_1.astronaut_id ORDER BY astronaut.firstname

8.12.10. Correlated Subqueries

A subquery in the columns clause or in the WHERE clause of the enclosing SELECT statement
Should return exactly one row and one column
Used as a column expression in bigger column query

A 'scalar subquery' returns exactly one row and one column. We indicate this intent using the scalar_subquery() method after construction [1].

✘>>> from sqlalchemy import func
>>>
>>>
>>> corr = (
...     select(func.count(mission.c.id)).
...     where(astronaut.c.id == mission.c.astronaut_id).
...     scalar_subquery()
... )
>>>
>>> print(corr)
(SELECT count(mission.id) AS count_1
FROM mission, astronaut
WHERE astronaut.id = mission.astronaut_id)

The subquery here refers to two tables. Printing it alone, we can see both tables in the FROM clause.

However, a scalar subquery will by default 'auto correlate' in a larger SQL expression, omitting a FROM that is found in the immediate enclosing SELECT.

✘>>> query = select(astronaut.c.firstname, corr)
>>>
>>> print(query)
SELECT astronaut.firstname, (SELECT count(mission.id) AS count_1
FROM mission
WHERE astronaut.id = mission.astronaut_id) AS anon_1
FROM astronaut

8.12. SQLAlchemy Core Join

8.12.1. SetUp

8.12.2. Select Multiple Tables

8.12.3. Join From

8.12.4. Join

8.12.5. Join On

8.12.6. Table Aliases

8.12.7. Subqueries

8.12.8. Subqueries Group By

8.12.9. Common Table Expressions

8.12.10. Correlated Subqueries

8.12.11. References