英文:
SQLAlchemy 2.x: Eagerly load joined collection query
问题
上下文
使用SQLAlchemy 2.x,如何急切地加载一个连接的集合?
假设我们有以下模型 Parent
和 Child
:
class Parent(Base):
__tablename__ = "parent"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
children: Mapped[List["Child"]] = relationship(
back_populates="parent", cascade="all, delete-orphan"
)
def __repr__(self) -> str:
return f"Parent(id={self.id!r}, name={self.name!r})"
class Child(Base):
__tablename__ = "child"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
parent_id: Mapped[int] = mapped_column(ForeignKey("parent.id"))
parent: Mapped["Parent"] = relationship(back_populates="children")
def __repr__(self) -> str:
return f"Child(id={self.id!r}, name={self.name!r})"
我想要获取具有 id
等于 1 的 Child
的 Parent
。并填充结果 parent.children
与其所有子项。
换句话说,如果我们的 parent
和 child
表已经填充了:
parent
id name
1 p1
2 p2
child
id name parent_id
1 c1 1
2 c2 1
3 c2 1
4 c3 1
5 c4 2
我想看到查询结果对象:
result = <query parent whose children has id == 1>
print(result)
>>> Parent(id=1, name='p1')
print(result.children)
>>> [
Child(id=1, name='c1'),
Child(id=2, name='c2'),
Child(id=3, name='c3'),
Child(id=4, name='c4'),
]
测试用例 1
stmt = select(Parent).join(Parent.children).where(Child.id == 1)
生成以下SQL:
SELECT parent.id, parent.name
FROM parent JOIN child ON parent.id = child.parent_id
WHERE child.id = 1
看起来很不错,但由于我没有告诉SQLAlchemy要急切地加载 children
,当访问结果标量对象(parent.children
)时,我会得到以下错误:
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)
测试用例 2
stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)
生成以下SQL:
SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child
WHERE child.id = 1
这不是我们想要的,注意现在 child
出现在 FROM
子句中。
英文:
Context
With SQLAlchemy 2.x, how to eagerly load a joined collection?
Let's say we have the following models Parent
and Child
:
class Parent(Base):
__tablename__ = "parent"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
children: Mapped[List["Child"]] = relationship(
back_populates="parent", cascade="all, delete-orphan"
)
def __repr__(self) -> str:
return f"Parent(id={self.id!r}, name={self.name!r})"
class Child(Base):
__tablename__ = "child"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(String(30))
parent_id: Mapped[int] = mapped_column(ForeignKey("parent.id"))
parent: Mapped["Parent"] = relationship(back_populates="children")
def __repr__(self) -> str:
return f"Child(id={self.id!r}, name={self.name!r})"
And I would like to get the Parent
that has a Child
with id
equal to 1. And populate the result parent.children
with all its children.
IOW, if our parent
and child
tables are populated with:
parent
id name
1 p1
2 p2
child
id name parent_id
1 c1 1
2 c2 1
3 c2 1
4 c3 1
5 c4 2
I would like to see the query result object:
result = <query parent whose children has id == 1>
print(result)
>>> Parent(id=1, name='p1')
print(result.children)
>>> [
Child(id=1, name='c1'),
Child(id=2, name='c2'),
Child(id=3, name='c3'),
Child(id=4, name='c4'),
]
Test case 1
stmt = select(Parent).join(Parent.children).where(Child.id == 1)
Generates the following SQL:
SELECT parent.id, parent.name
FROM parent JOIN child ON parent.id = child.parent_id
WHERE child.id = 1
Which looks great, but since I don't tell sqlalchemy to eagerly load children
, when accessing the result scalar object (parent.children
), I get the error:
sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place? (Background on this error at: https://sqlalche.me/e/20/xd2s)
Test case 2
stmt = select(Parent).options(joinedload(Parent.children)).where(Child.id == 1)
Generates the following SQL:
SELECT parent.id, parent.name, child_1.id AS id_1, child_1.name AS name_1, child_1.parent_id
FROM parent JOIN child AS child_1 ON parent.id = child_1.parent_id, child
WHERE child.id = 1
Which is not what we're looking for, notice child
now appears in the FROM
clause.
答案1
得分: 1
我有点吃惊,但使用别名加入似乎可以工作,但似乎以后可能会有问题。我有点喜欢子查询,这可能在以后更加灵活,甚至可能更快。
我没有使用异步测试,但我认为它应该可以工作。
英文:
I was kind of suprised but joining with an alias appears to work but it seems like a footgun for later. I'm kind of a big fan of subqueries and that might be more flexible later on and maybe even faster.
I did not test this with async but I think it should work.
2023-05-27 15:44:06,791 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id
FROM parents JOIN childs AS childs_2 ON parents.id = childs_2.parent_id LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id
WHERE childs_2.id = %(id_2)s
2023-05-27 15:44:06,809 INFO sqlalchemy.engine.Engine [generated in 0.00066s] {'id_2': 1}
2023-05-27 15:44:06,814 INFO sqlalchemy.engine.Engine ROLLBACK
2023-05-27 15:44:06,815 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine SELECT parents.id, parents.name, childs_1.id AS id_1, childs_1.name AS name_1, childs_1.parent_id
FROM parents LEFT OUTER JOIN childs AS childs_1 ON parents.id = childs_1.parent_id
WHERE parents.id IN (SELECT childs.parent_id
FROM childs
WHERE childs.id = %(id_2)s)
2023-05-27 15:44:06,817 INFO sqlalchemy.engine.Engine [generated in 0.00018s] {'id_2': 1}
2023-05-27 15:44:06,819 INFO sqlalchemy.engine.Engine ROLLBACK
import sys
from sqlalchemy import (
create_engine,
Integer,
String,
)
from sqlalchemy.schema import (
Column,
ForeignKey,
)
from sqlalchemy.sql import select
from sqlalchemy.orm import declarative_base, Session, aliased, relationship, joinedload
Base = declarative_base()
username, password, db = sys.argv[1:4]
engine = create_engine(f"postgresql+psycopg2://{username}:{password}@/{db}", echo=True)
class Parent(Base):
__tablename__ = "parents"
id = Column(Integer, primary_key=True)
name = Column(String)
childs = relationship("Child", back_populates="parent")
class Child(Base):
__tablename__ = 'childs'
id = Column(Integer, primary_key=True)
name = Column(String)
parent_id = Column(Integer, ForeignKey('parents.id'), nullable=False)
parent = relationship("Parent", back_populates="childs")
Base.metadata.create_all(engine)
parents = for line in """
1 p1
2 p2
""".splitlines() if line.strip()]
# Changed child names to be consistent.
childs = for line in """
1 c1 1
2 c2 1
3 c3 1
4 c4 1
5 c5 2
""".splitlines() if line.strip()]
with Session(engine) as session:
session.add_all([Parent(id=int(parent[0]), name=parent[1]) for parent in parents])
session.add_all([Child(id=int(child[0]), name=child[1], parent_id=int(child[2])) for child in childs])
session.commit()
def test_results(parents):
assert len(parents) == 1
assert len(parents[0].childs) == 4
assert list(sorted([child.id for child in parents[0].childs])) == [1, 2, 3, 4]
# using aliased to prevent join conflict
with Session(engine) as session:
c2 = aliased(Child)
q = select(Parent).join(c2, Parent.id == c2.parent_id).where(c2.id == 1).options(joinedload(Parent.childs))
parents = [parent for parent in session.scalars(q).unique()]
test_results(parents)
# subquery
with Session(engine) as session:
parents = list(session.scalars(select(Parent).where(Parent.id.in_(select(Child.parent_id).where(Child.id == 1))).options(joinedload(Parent.childs))).unique())
test_results(parents)
答案2
得分: 0
以下是您要翻译的内容:
"The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload
, I decided to also contribute an answer.
If you know for a fact that a child will only have one parent, the IN
comparison can be fully avoided by using the standard ==
, like the following example:
with Session(engine) as session:
# The parent with Child.id == 1, which contains it plus its siblings
query = (
select(Parent)
.join(Parent.children) # plurals are hard (childs isn't quite right)
.where(Parent.id ==
select(Child.parent_id)
.where(Child.id == 1)
.scalar_subquery() # to ensure a single result returned here
)
.options(selectinload(Parent.children))
)
result = session.execute(query).unique().all()
print(result)
Do note that there is an additional function call attached to the subquery - .scalar_subquery()
is needed otherwise this SAWarning
is triggered:
SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.
Also in my example I used selectinload
(from sqlalchemy.orm
) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name
FROM parent JOIN child ON parent.id = child.parent_id
WHERE parent.id = (SELECT child.parent_id
FROM child
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name
FROM child
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK
Splitting the query up may be necessary - as query using the either contains_eager
and joinedload
option will result in a JOIN
query of some kind which would result in all values from the left hand side (in this case the Parent
) be repeated for every Child
rows that gets returned by the query. Not too much of a huge deal in this small example, but this can get very expensive quickly if there are many columns in Parent
(and worse if they are very large columns) and/or there are many Child
rows that will be returned by the query due to how a join query involves doing a Cartesian product (multiplication) between the two sides of the join, thus this strategy is generally not recommended for 1-N or N-to-N relationships. A funny quote (also from that thread) about why doing the join query might be bad, "I was a witness when someone performed several joinedloads and DB returned output 25 times bigger than entire database :)".
For some additional background, this answer goes into a bit more details about joinedload
vs selectinload
; this answer discusses the difference between joinedload
and contains_eager
as to the difference between them, and an example on the latter."
英文:
The answer @IanWilson wrote is quite excellent, but to expand on the subquery case and expand on his comment about selectinload
, I decided to also contribute an answer.
If you know for a fact that a child will only have one parent, the IN
comparison can be fully avoided by using the standard ==
, like the following example:
with Session(engine) as session:
# The parent with Child.id == 1, which contains it plus its siblings
query = (
select(Parent)
.join(Parent.children) # plurals are hard (childs isn't quite right)
.where(Parent.id ==
select(Child.parent_id)
.where(Child.id == 1)
.scalar_subquery() # to ensure a single result returned here
)
.options(selectinload(Parent.children))
)
result = session.execute(query).unique().all()
print(result)
Do note that there is an additional function call attached to the subquery - .scalar_subquery()
is needed otherwise this SAWarning
is triggered:
SAWarning: implicitly coercing SELECT object to scalar subquery; please use the .scalar_subquery() method to produce a scalar subquery.
Also in my example I used selectinload
(from sqlalchemy.orm
) as an option as suggested in the answer I referenced to show what would happen - the following queries would instead be emitted:
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:SELECT parent.id, parent.name
FROM parent JOIN child ON parent.id = child.parent_id
WHERE parent.id = (SELECT child.parent_id
FROM child
WHERE child.id = ?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00014s] (1,)
INFO:sqlalchemy.engine.Engine:SELECT child.parent_id AS child_parent_id, child.id AS child_id, child.name AS child_name
FROM child
WHERE child.parent_id IN (?)
INFO:sqlalchemy.engine.Engine:[generated in 0.00016s] (1,)
INFO:sqlalchemy.engine.Engine:ROLLBACK
Splitting the query up may be necessary - as query using the either contains_eager
and joinedload
option will result in a JOIN
query of some kind which would result in all values from the left hand side (in this case the Parent
) be repeated for every Child
rows that gets returned by the query. Not too much of a huge deal in this small example, but this can get very expensive quickly if there are many columns in Parent
(and worse if they are very large columns) and/or there are many Child
rows that will be returned by the query due to how a join query involves doing a Cartesian product (multiplication) between the two sides of the join, thus this strategy is generally not recommended for 1-N or N-to-N relationships. A funny quote (also from that thread) about why doing the join query might be bad, "I was a witness when someone performed several joinedloads and DB returned output 25 times bigger than entire database :)".
For some additional background, this answer goes into a bit more details about joinedload
vs selectinload
; this answer discusses the difference between joinedload
and contains_eager
as to the difference between them, and an example on the latter.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论