2023年5月28日 04:28:12go评论124阅读模式

英文:

SQLAlchemy 2.x: Eagerly load joined collection query

问题

上下文

使用SQLAlchemy 2.x，如何急切地加载一个连接的集合？

假设我们有以下模型 Parent 和 Child：

class Parent(Base):
    __tablename__ = &quot;parent&quot;
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    children: Mapped[List[&quot;Child&quot;]] = relationship(
        back_populates=&quot;parent&quot;, cascade=&quot;all, delete-orphan&quot;
    )
    def __repr__(self) -&gt; str:
        return f&quot;Parent(id={self.id!r}, name={self.name!r})&quot;
class Child(Base):
    __tablename__ = &quot;child&quot;
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    parent_id: Mapped[int] = mapped_column(ForeignKey(&quot;parent.id&quot;))
    parent: Mapped[&quot;Parent&quot;] = relationship(back_populates=&quot;children&quot;)
    def __repr__(self) -&gt; str:
        return f&quot;Child(id={self.id!r}, name={self.name!r})&quot;

我想要获取具有 id 等于 1 的 Child 的 Parent。并填充结果 parent.children 与其所有子项。

换句话说，如果我们的 parent 和 child 表已经填充了：

parent
id  name
1   p1
2   p2

child
id  name  parent_id
1   c1    1
2   c2    1
3   c2    1
4   c3    1
5   c4    2

我想看到查询结果对象：

result = &lt;query parent whose children has id == 1&gt;
print(result)
&gt;&gt;&gt; Parent(id=1, name=&#39;p1&#39;)
print(result.children)
&gt;&gt;&gt; [
        Child(id=1, name=&#39;c1&#39;),
        Child(id=2, name=&#39;c2&#39;),
        Child(id=3, name=&#39;c3&#39;),
        Child(id=4, name=&#39;c4&#39;),
    ]

测试用例 1

stmt = select(Parent).join(Parent.children).where(Child.id == 1)

生成以下SQL：

SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE child.id = 1

看起来很不错，但由于我没有告诉SQLAlchemy要急切地加载 children，当访问结果标量对象（parent.children）时，我会得到以下错误：

sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

测试用例 2

stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)

生成以下SQL：

SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id 
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child 
WHERE child.id = 1

这不是我们想要的，注意现在 child 出现在 FROM 子句中。

英文:

Context

With SQLAlchemy 2.x, how to eagerly load a joined collection?

Let's say we have the following models Parent and Child:

class Parent(Base):
    __tablename__ = &quot;parent&quot;
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    children: Mapped[List[&quot;Child&quot;]] = relationship(
        back_populates=&quot;parent&quot;, cascade=&quot;all, delete-orphan&quot;
    )
    def __repr__(self) -&gt; str:
        return f&quot;Parent(id={self.id!r}, name={self.name!r})&quot;
class Child(Base):
    __tablename__ = &quot;child&quot;
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    parent_id: Mapped[int] = mapped_column(ForeignKey(&quot;parent.id&quot;))
    parent: Mapped[&quot;Parent&quot;] = relationship(back_populates=&quot;children&quot;)
    def __repr__(self) -&gt; str:
        return f&quot;Child(id={self.id!r}, name={self.name!r})&quot;

And I would like to get the Parent that has a Child with id equal to 1. And populate the result parent.children with all its children.

IOW, if our parent and child tables are populated with:

parent
id  name
1   p1
2   p2

child
id  name  parent_id
1   c1    1
2   c2    1
3   c2    1
4   c3    1
5   c4    2

I would like to see the query result object:

result = &lt;query parent whose children has id == 1&gt;
print(result)
&gt;&gt;&gt; Parent(id=1, name=&#39;p1&#39;)
print(result.children)
&gt;&gt;&gt; [
        Child(id=1, name=&#39;c1&#39;),
        Child(id=2, name=&#39;c2&#39;),
        Child(id=3, name=&#39;c3&#39;),
        Child(id=4, name=&#39;c4&#39;),
    ]

Test case 1

stmt = select(Parent).join(Parent.children).where(Child.id == 1)

Generates the following SQL:

SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE child.id = 1

Which looks great, but since I don't tell sqlalchemy to eagerly load children, when accessing the result scalar object (parent.children), I get the error:

sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can&#39;t call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)

Test case 2

stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)

Generates the following SQL:

SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id 
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child 
WHERE child.id = 1

Which is not what we're looking for, notice child now appears in the FROM clause.

答案1

得分: 1

我有点吃惊，但使用别名加入似乎可以工作，但似乎以后可能会有问题。我有点喜欢子查询，这可能在以后更加灵活，甚至可能更快。

我没有使用异步测试，但我认为它应该可以工作。

英文:

I was kind of suprised but joining with an alias appears to work but it seems like a footgun for later. I'm kind of a big fan of subqueries and that might be more flexible later on and maybe even faster.

I did not test this with async but I think it should work.

2023-05-27 15:44:06,791 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id 
FROM parents JOIN childs AS childs_2 ON parents.id = childs_2.parent_id LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id 
WHERE childs_2.id = %(id_2)s
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine [generated in 0.00066s] {&#39;id_2&#39;: 1}
2023-05-27 15:44:06,814 INFO sqlalchemy.engine.Engine ROLLBACK
2023-05-27 15:44:06,815 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id 
FROM parents LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id 
WHERE parents.id IN (SELECT childs.parent_id 
FROM childs 
WHERE childs.id = %(id_2)s)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine [generated in 0.00018s] {&#39;id_2&#39;: 1}
2023-05-27 15:44:06,819 INFO sqlalchemy.engine.Engine ROLLBACK

import sys
from sqlalchemy import (
    create_engine,
    Integer,
    String,
)
from sqlalchemy.schema import (
    Column,
    ForeignKey,
)
from sqlalchemy.sql import select
from sqlalchemy.orm import declarative_base, Session, aliased, relationship, joinedload
Base = declarative_base()
username, password, db = sys.argv[1:4]
engine = create_engine(f&quot;postgresql+psycopg2://{username}:{password}@/{db}&quot;, echo=True)
class Parent(Base):
    __tablename__ = &quot;parents&quot;
    id = Column(Integer, primary_key=True)
    name = Column(String)
    childs = relationship(&quot;Child&quot;, back_populates=&quot;parent&quot;)
class Child(Base):
    __tablename__ = &#39;childs&#39;
    id = Column(Integer, primary_key=True)
    name = Column(String)
    parent_id = Column(Integer, ForeignKey(&#39;parents.id&#39;), nullable=False)
    parent = relationship(&quot;Parent&quot;, back_populates=&quot;childs&quot;)
Base.metadata.create_all(engine)
parents = 
 for line in &quot;&quot;&quot;
1   p1
2   p2
&quot;&quot;&quot;.splitlines() if line.strip()]
# Changed child names to be consistent.
childs = 
 for line in &quot;&quot;&quot;
1   c1    1
2   c2    1
3   c3    1
4   c4    1
5   c5    2
&quot;&quot;&quot;.splitlines() if line.strip()]
with Session(engine) as session:
    session.add_all([Parent(id=int(parent[0]), name=parent[1]) for parent in parents])
    session.add_all([Child(id=int(child[0]), name=child[1], parent_id=int(child[2])) for child in childs])
    session.commit()
def test_results(parents):
    assert len(parents) == 1
    assert len(parents[0].childs) == 4
    assert list(sorted([child.id for child in parents[0].childs])) == [1, 2, 3, 4]
# using aliased to prevent join conflict
with Session(engine) as session:
    c2 = aliased(Child)
    q = select(Parent).join(c2, Parent.id == c2.parent_id).where(c2.id == 1).options(joinedload(Parent.childs))
    parents = [parent for parent in session.scalars(q).unique()]
    test_results(parents)
# subquery
with Session(engine) as session:
    parents = list(session.scalars(select(Parent).where(Parent.id.in_(select(Child.parent_id).where(Child.id == 1))).options(joinedload(Parent.childs))).unique())
    test_results(parents)

答案2

得分: 0

以下是您要翻译的内容：

"The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload, I decided to also contribute an answer.

If you know for a fact that a child will only have one parent, the IN comparison can be fully avoided by using the standard ==, like the following example:

with Session(engine) as session:
    # The parent with Child.id == 1, which contains it plus its siblings
    query = (
        select(Parent)
        .join(Parent.children)  # plurals are hard (childs isn't quite right)
        .where(Parent.id ==
            select(Child.parent_id)
            .where(Child.id == 1)
            .scalar_subquery()  # to ensure a single result returned here
        )
        .options(selectinload(Parent.children))
    )
    result = session.execute(query).unique().all()
    print(result)

Do note that there is an additional function call attached to the subquery - .scalar_subquery() is needed otherwise this SAWarning is triggered:

SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.

Also in my example I used selectinload (from sqlalchemy.orm) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE parent.id = (SELECT child.parent_id 
FROM child 
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name 
FROM child 
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK

Splitting the query up may be necessary - as query using the either contains_eager and joinedload option will result in a JOIN query of some kind which would result in all values from the left hand side (in this case the Parent) be repeated for every Child rows that gets returned by the query. Not too much of a huge deal in this small example, but this can get very expensive quickly if there are many columns in Parent (and worse if they are very large columns) and/or there are many Child rows that will be returned by the query due to how a join query involves doing a Cartesian product (multiplication) between the two sides of the join, thus this strategy is generally not recommended for 1-N or N-to-N relationships. A funny quote (also from that thread) about why doing the join query might be bad, "I was a witness when someone performed several joinedloads and DB returned output 25 times bigger than entire database :)".

For some additional background, this answer goes into a bit more details about joinedload vs selectinload; this answer discusses the difference between joinedload and contains_eager as to the difference between them, and an example on the latter."

英文:

The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload, I decided to also contribute an answer.

If you know for a fact that a child will only have one parent, the IN comparison can be fully avoided by using the standard ==, like the following example:

with Session(engine) as session:
    # The parent with Child.id == 1, which contains it plus its siblings
    query = (
        select(Parent)
        .join(Parent.children)  # plurals are hard (childs isn&#39;t quite right)
        .where(Parent.id ==
            select(Child.parent_id)
            .where(Child.id == 1)
            .scalar_subquery()  # to ensure a single result returned here
        )
        .options(selectinload(Parent.children))
    )
    result = session.execute(query).unique().all()
    print(result)

Do note that there is an additional function call attached to the subquery - .scalar_subquery() is needed otherwise this SAWarning is triggered:

SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.

Also in my example I used selectinload (from sqlalchemy.orm) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:

INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name 
FROM parent JOIN child ON parent.id = child.parent_id 
WHERE parent.id = (SELECT child.parent_id 
FROM child 
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name 
FROM child 
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQLAlchemy 2.x：立即加载连接的集合查询

问题

上下文

测试用例 1

测试用例 2

Context

Test case 1

Test case 2

答案1

答案2

Python，循环读取Excel文件的工作表，更改标题行号

分离日期和年份，转换为日期时间，并创建一个新列

SQL执行时间太长。如何优化这个查询？

multiprocessing worker code not executing

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。