英文:
FastAPI - SQLAlchemy - Trouble returning most recent created child record with parent record
问题
I have tried reading the SQLAlchemy documents and trawled many articles here and other places with nothing working. I will admit I am no expert in SQLAlchemy and the solution maybe very simple.
我已经尝试阅读SQLAlchemy的文档,并查阅了许多文章,但没有找到有效的解决方案。我承认我对SQLAlchemy不是专家,解决方案可能非常简单。
I have a FastAPI running on a local Postgres database with two models for a small sample tracking system I want to implement. Simplified models are below:
我在本地Postgres数据库上运行了一个FastAPI,用于一个我想要实现的小样本跟踪系统,我有两个模型。简化后的模型如下:
For the endpoint my schema returns a list of the related LocationLog in the field logs as expected.
I want to limit this to only the latest time_created
, but everything i have tried to do this has failed resulting in either an error or all locations being returned for each sample rather than the most recent.
对于我的端点,我的模式如预期地返回了与字段"logs"相关的LocationLog列表。
我希望将这些限制为仅包括最新的"time_created",但我尝试的一切都失败了,要么出现错误,要么为每个样本返回所有位置,而不是最近的位置。
From my research it points to using a SQLAlchemy query like this:
根据我的研究,它指向使用类似以下的SQLAlchemy查询:
subq = select(func.max(StationLog.time_created).label("max_created"), StationLog.sample_id).group_by(StationLog.sample_id).subquery()
query = select(Sample).join(subq, subq.c.sample_id == Sample.id)
db.scalars(query.order_by(Sample.id).offset(skip).limit(limit)).all()
However, this retrieves all of the related LocationLog records. I've also tried many other variations which either result in an error or just returning all of the related records.
然而,这会检索到所有相关的LocationLog记录。我也尝试了许多其他变种,要么会导致错误,要么只返回所有相关的记录。
Where am I going wrong? Thank in advance!
我错在哪里?提前感谢您!
英文:
I have tried reading the SQLAlchemy documents and trawled many articles here and other places with nothing working. I will admit I am no expert in SQLAlchemy and the solution maybe very simple.
I have a FastAPI running on a local Postgres database with two models for a small sample tracking system I want to implement. Simplified models are below:
class Sample(Base):
__tablename__ = "samples"
id: Mapped[int] = mapped_column(primary_key=True, index=True, autoincrement=True)
: Mapped[str] = mapped_column(String(32), index=True)
...
time_created: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now()
)
logs: Mapped[list["StationLog"]] = relationship(back_populates="sample")
class LocationLog(Base):
__tablename__ = "locationlogs"
id: Mapped[int] = mapped_column(primary_key=True, index=True, autoincrement=True)
...
time_created: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now()
)
sample_id: Mapped[int] = mapped_column(ForeignKey("samples.id"))
sample: Mapped["Sample"] = relationship(back_populates="logs")
For the endpoint my schema returns a list of the related LocationLog in the field logs as expected.
I want to limit this to only the latest time_created
, but everything i have tried to do this has failed resulting in either an error or all locations being returned for each sample rather than the most recent.
From my research it points to using a SQLAlchemy query like this:
subq = select(func.max(StationLog.time_created).label("max_created"), StationLog.sample_id).group_by(StationLog.sample_id).subquery()
query = select(Sample).join(subq, subq.c.sample_id == Sample.id)
db.scalars(query.order_by(Sample.id).offset(skip).limit(limit)).all()
However, this retrieves all of the related LocationLog records. I've also tried many other variations which either result in an error or just returning all of the related records.
Where am I going wrong? Thank in advance!
答案1
得分: 1
这是一个有点奇怪的问题,但我认为这会起作用。请注意,distinct()
将生成一个 DISTINCT ON
子句,这仅适用于 PostgreSQL。您可能可以将子查询和额外的 LEFT OUTER JOIN
结合起来,但我不知道如何做。我认为有几种方法可以实现这一点。
# 为每个样本 ID 返回第一个不同的行,按样本 ID 的 time_created 降序排序。
# 我在排序中添加了站点日志 ID 作为绑定分数,以使结果确定性。
sub = select(
StationLog.sample_id,
StationLog.id,
StationLog.time_created
).order_by(
StationLog.sample_id,
StationLog.time_created.desc(),
StationLog.id
).distinct(
StationLog.sample_id
).subquery()
# 我们对子查询进行外连接,以获取“最新”的站点日志 ID
# 然后再次外连接以获取完整的 StationLog 列集合,以构建完整的 StationLog
stmt = select(
Sample,
StationLog
).select_from(
Sample
).outerjoin(
sub,
Sample.id == sub.c.sample_id
).outerjoin(
StationLog,
sub.c.id == StationLog.id)
result = session.execute(stmt)
# 请注意,当没有日志时,station_log 将为 None
for (sample, station_log) in result:
print (sample, station_log)
希望这对您有所帮助!
英文:
This is kind of a weird problem but I think this will work. Note that distinct()
will generate a DISTINCT ON
clause which is postgresql only. You might be able to combine the sub and extra LEFT OUTER JOIN
but I don't know how. I think there are a few ways to do this.
# return the first distinct row for each sample id, ordering by time_created descending for each sample id.
# I added the station log id in the ordering as a tie breaker to make the results deterministic.
sub = select(
StationLog.sample_id,
StationLog.id,
StationLog.time_created
).order_by(
StationLog.sample_id,
StationLog.time_created.desc(),
StationLog.id
).distinct(
StationLog.sample_id
).subquery()
# we outer join to the subquery to get the "latest" station log id
# then outer join again to fetch the full set of columns to
# build a complete StationLog
stmt = select(
Sample,
StationLog
).select_from(
Sample
).outerjoin(
sub,
Sample.id == sub.c.sample_id
).outerjoin(
StationLog,
sub.c.id == StationLog.id)
result = session.execute(stmt)
# note that station_log will be None when there are no logs
for (sample, station_log) in result:
print (sample, station_log)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论