SQLAlchemy 2.x:立即加载连接的集合查询

huangapple go评论89阅读模式
英文:

SQLAlchemy 2.x: Eagerly load joined collection query

问题

上下文

使用SQLAlchemy 2.x,如何急切地加载一个连接的集合?

假设我们有以下模型 ParentChild

class Parent(Base):
    __tablename__ = "parent"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))

    children: Mapped[List["Child"]] = relationship(
        back_populates="parent", cascade="all, delete-orphan"
    )

    def __repr__(self) -> str:
        return f"Parent(id={self.id!r}, name={self.name!r})"

class Child(Base):
    __tablename__ = "child"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))

    parent_id: Mapped[int] = mapped_column(ForeignKey("parent.id"))
    parent: Mapped["Parent"] = relationship(back_populates="children")

    def __repr__(self) -> str:
        return f"Child(id={self.id!r}, name={self.name!r})"

我想要获取具有 id 等于 1 的 ChildParent。并填充结果 parent.children 与其所有子项。

换句话说,如果我们的 parentchild 表已经填充了:

parent
id  name
1   p1
2   p2
child
id  name  parent_id
1   c1    1
2   c2    1
3   c2    1
4   c3    1
5   c4    2

我想看到查询结果对象:

result = <query parent whose children has id == 1>
print(result)
>>> Parent(id=1, name='p1')

print(result.children)
>>> [
        Child(id=1, name='c1'),
        Child(id=2, name='c2'),
        Child(id=3, name='c3'),
        Child(id=4, name='c4'),
    ]

测试用例 1

stmt = select(Parent).join(Parent.children).where(Child.id == 1)

生成以下SQL:

SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE child.id = 1

看起来很不错,但由于我没有告诉SQLAlchemy要急切地加载 children,当访问结果标量对象(parent.children)时,我会得到以下错误:

sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

测试用例 2

stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)

生成以下SQL:

SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id 
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child 
WHERE child.id = 1

这不是我们想要的,注意现在 child 出现在 FROM 子句中。

英文:

Context

With SQLAlchemy 2.x, how to eagerly load a joined collection?

Let's say we have the following models Parent and Child:

class Parent(Base):
    __tablename__ = "parent"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))

    children: Mapped[List["Child"]] = relationship(
        back_populates="parent", cascade="all, delete-orphan"
    )

    def __repr__(self) -> str:
        return f"Parent(id={self.id!r}, name={self.name!r})"

class Child(Base):
    __tablename__ = "child"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))

    parent_id: Mapped[int] = mapped_column(ForeignKey("parent.id"))
    parent: Mapped["Parent"] = relationship(back_populates="children")

    def __repr__(self) -> str:
        return f"Child(id={self.id!r}, name={self.name!r})"

And I would like to get the Parent that has a Child with id equal to 1. And populate the result parent.children with all its children.

IOW, if our parent and child tables are populated with:

parent
id  name
1   p1
2   p2
child
id  name  parent_id
1   c1    1
2   c2    1
3   c2    1
4   c3    1
5   c4    2

I would like to see the query result object:

result = <query parent whose children has id == 1>
print(result)
>>> Parent(id=1, name='p1')

print(result.children)
>>> [
        Child(id=1, name='c1'),
        Child(id=2, name='c2'),
        Child(id=3, name='c3'),
        Child(id=4, name='c4'),
    ]

Test case 1

stmt = select(Parent).join(Parent.children).where(Child.id == 1)

Generates the following SQL:

SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE child.id = 1

Which looks great, but since I don't tell sqlalchemy to eagerly load children, when accessing the result scalar object (parent.children), I get the error:

sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

Test case 2

stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)

Generates the following SQL:

SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id 
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child 
WHERE child.id = 1

Which is not what we're looking for, notice child now appears in the FROM clause.

答案1

得分: 1

我有点吃惊,但使用别名加入似乎可以工作,但似乎以后可能会有问题。我有点喜欢子查询,这可能在以后更加灵活,甚至可能更快。

我没有使用异步测试,但我认为它应该可以工作。

英文:

I was kind of suprised but joining with an alias appears to work but it seems like a footgun for later. I'm kind of a big fan of subqueries and that might be more flexible later on and maybe even faster.

I did not test this with async but I think it should work.

2023-05-27 15:44:06,791 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id 
FROM parents JOIN childs AS childs_2 ON parents.id = childs_2.parent_id LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id 
WHERE childs_2.id = %(id_2)s
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine [generated in 0.00066s] {'id_2': 1}
2023-05-27 15:44:06,814 INFO sqlalchemy.engine.Engine ROLLBACK
2023-05-27 15:44:06,815 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id 
FROM parents LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id 
WHERE parents.id IN (SELECT childs.parent_id 
FROM childs 
WHERE childs.id = %(id_2)s)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine [generated in 0.00018s] {'id_2': 1}
2023-05-27 15:44:06,819 INFO sqlalchemy.engine.Engine ROLLBACK
import sys
from sqlalchemy import (
    create_engine,
    Integer,
    String,
)
from sqlalchemy.schema import (
    Column,
    ForeignKey,
)
from sqlalchemy.sql import select
from sqlalchemy.orm import declarative_base, Session, aliased, relationship, joinedload


Base = declarative_base()


username, password, db = sys.argv[1:4]


engine = create_engine(f"postgresql+psycopg2://{username}:{password}@/{db}", echo=True)


class Parent(Base):
    __tablename__ = "parents"
    id = Column(Integer, primary_key=True)
    name = Column(String)
    childs = relationship("Child", back_populates="parent")


class Child(Base):
    __tablename__ = 'childs'
    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_id = Column(Integer, ForeignKey('parents.id'), nullable=False)
    parent = relationship("Parent", back_populates="childs")


Base.metadata.create_all(engine)

parents = 
for line in """
1 p1 2 p2 """.splitlines() if line.strip()] # Changed child names to be consistent. childs =
for line in """
1 c1 1 2 c2 1 3 c3 1 4 c4 1 5 c5 2 """.splitlines() if line.strip()] with Session(engine) as session: session.add_all([Parent(id=int(parent[0]), name=parent[1]) for parent in parents]) session.add_all([Child(id=int(child[0]), name=child[1], parent_id=int(child[2])) for child in childs]) session.commit() def test_results(parents): assert len(parents) == 1 assert len(parents[0].childs) == 4 assert list(sorted([child.id for child in parents[0].childs])) == [1, 2, 3, 4] # using aliased to prevent join conflict with Session(engine) as session: c2 = aliased(Child) q = select(Parent).join(c2, Parent.id == c2.parent_id).where(c2.id == 1).options(joinedload(Parent.childs)) parents = [parent for parent in session.scalars(q).unique()] test_results(parents) # subquery with Session(engine) as session: parents = list(session.scalars(select(Parent).where(Parent.id.in_(select(Child.parent_id).where(Child.id == 1))).options(joinedload(Parent.childs))).unique()) test_results(parents)

答案2

得分: 0

以下是您要翻译的内容:

"The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload, I decided to also contribute an answer.

If you know for a fact that a child will only have one parent, the IN comparison can be fully avoided by using the standard ==, like the following example:

with Session(engine) as session:
    # The parent with Child.id == 1, which contains it plus its siblings
    query = (
        select(Parent)
        .join(Parent.children)  # plurals are hard (childs isn't quite right)
        .where(Parent.id ==
            select(Child.parent_id)
            .where(Child.id == 1)
            .scalar_subquery()  # to ensure a single result returned here
        )
        .options(selectinload(Parent.children))
    )
    result = session.execute(query).unique().all()
    print(result)

Do note that there is an additional function call attached to the subquery - .scalar_subquery() is needed otherwise this SAWarning is triggered:

SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.

Also in my example I used selectinload (from sqlalchemy.orm) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE parent.id = (SELECT child.parent_id 
FROM child 
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name 
FROM child 
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK

Splitting the query up may be necessary - as query using the either contains_eager and joinedload option will result in a JOIN query of some kind which would result in all values from the left hand side (in this case the Parent) be repeated for every Child rows that gets returned by the query. Not too much of a huge deal in this small example, but this can get very expensive quickly if there are many columns in Parent (and worse if they are very large columns) and/or there are many Child rows that will be returned by the query due to how a join query involves doing a Cartesian product (multiplication) between the two sides of the join, thus this strategy is generally not recommended for 1-N or N-to-N relationships. A funny quote (also from that thread) about why doing the join query might be bad, "I was a witness when someone performed several joinedloads and DB returned output 25 times bigger than entire database :)".

For some additional background, this answer goes into a bit more details about joinedload vs selectinload; this answer discusses the difference between joinedload and contains_eager as to the difference between them, and an example on the latter."

英文:

The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload, I decided to also contribute an answer.

If you know for a fact that a child will only have one parent, the IN comparison can be fully avoided by using the standard ==, like the following example:

with Session(engine) as session:
    # The parent with Child.id == 1, which contains it plus its siblings
    query = (
        select(Parent)
        .join(Parent.children)  # plurals are hard (childs isn't quite right)
        .where(Parent.id ==
            select(Child.parent_id)
            .where(Child.id == 1)
            .scalar_subquery()  # to ensure a single result returned here
        )
        .options(selectinload(Parent.children))
    )
    result = session.execute(query).unique().all()
    print(result)

Do note that there is an additional function call attached to the subquery - .scalar_subquery() is needed otherwise this SAWarning is triggered:

SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.

Also in my example I used selectinload (from sqlalchemy.orm) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE parent.id = (SELECT child.parent_id 
FROM child 
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name 
FROM child 
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK

Splitting the query up may be necessary - as query using the either contains_eager and joinedload option will result in a JOIN query of some kind which would result in all values from the left hand side (in this case the Parent) be repeated for every Child rows that gets returned by the query. Not too much of a huge deal in this small example, but this can get very expensive quickly if there are many columns in Parent (and worse if they are very large columns) and/or there are many Child rows that will be returned by the query due to how a join query involves doing a Cartesian product (multiplication) between the two sides of the join, thus this strategy is generally not recommended for 1-N or N-to-N relationships. A funny quote (also from that thread) about why doing the join query might be bad, "I was a witness when someone performed several joinedloads and DB returned output 25 times bigger than entire database :)".

For some additional background, this answer goes into a bit more details about joinedload vs selectinload; this answer discusses the difference between joinedload and contains_eager as to the difference between them, and an example on the latter.

huangapple
  • 本文由 发表于 2023年5月28日 04:28:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76348906.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定