gpt4 book ai didi

python - 将关系的主要连接限制为一个

转载 作者:行者123 更新时间:2023-11-30 22:34:24 25 4
gpt4 key购买 nike

我有一个代表交付的 SQLAlchemy 模型;交货有目的地、包裹 ID 和日期:

class Delivery(Base):
delivery_id = Column(Integer, primary_key=True, autoincrement=True)
parcel_id = Column(ForeignKey('parcels.parcel_id'))
scheduled_date = Column(DateTime)
destination_id = Column(ForeignKey('location.location_id'))

现在,对于同一包裹,投递的始发地与先前投递的目的地相同。我没有通过维护基于指针的链接列表来非规范化该信息,而是使用计划日期来订购交货,目前如下所示:

def origin(delivery):
prior = session.query(Delivery)
.filter(
Delivery.parcel_id == delivery.parcel_id,
Delivery.scheduled_date < delivery.scheduled_date,
)
.order_by(Delivery.scheduled_date.desc())
.first()
return prior.location_id if prior else None

在纯 SQL 中,我可以将这个单独的查询转换为一个简单的子查询 + 连接,我在加载交付时将其包含在内。我已经足够了,我可以加载当前交付之前发生的所有相关交付:

_prior_delivery = \
select([Delivery.parcel_id, Delivery.scheduled_date, Location]) \
.where(and_(Location.location_id == remote(Delivery.location_id)) \
.order_by(Delivery.scheduled_date.desc()) \
.alias("prior_delivery")

Delivery.origin = relationship(
Location,
primaryjoin=and_(_prior_delivery.c.parcel_id == foreign(Delivery.parcel_id),
_prior_delivery.c.scheduled_date < foreign(Delivery.scheduled_date)),
secondary=_prior_delivery,
secondaryjoin=_prior_delivery.c.location_id == foreign(Location.location_id),
uselist=False,
viewonly=True)

由于 uselist=False,这实际上是有效的;但在幕后,它会返回当前交付之前发生的每一次交付; SQLAlchemy 打印一条警告,并且结果集比需要的大得多。

我的问题:如何将 limit(1) 应用于此只读关系?

最佳答案

第一次尝试

这很困难的原因是关系需要能够连接到主查询中。 SQLAlchemy 需要能够在同一个查询中加载关系,以便实现预加载。问题是,如何编写一个单个查询来加载Delivery列表及其每个来源

SELECT delivery.*, location.* FROM delivery
LEFT JOIN location ON location.location_id = (
SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
);

实际上,相关子查询

SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1

成为计算外键origin_id,您可以通过它连接到location表。将其转换为 SQLAlchemy,它会是这样的:

delivery = Delivery.__table__
location = Location.__table__
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
.where(delivery.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)
Delivery.origin = relationship(
Location,
primaryjoin=_origin_id == location.c.location_id,
viewonly=True)

不幸的是,对于我尝试过的所有remoteforeign注释的组合,这似乎不起作用

使用带有相关子查询的 SELECT 作为辅助

次佳解决方案是使用假辅助表:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
SELECT delivery.delivery_id, (
SELECT destination_id FROM delivery prior
WHERE delivery.parcel_id = prior.parcel_id
ORDER BY prior.scheduled_date DESC
LIMIT 1
) AS origin_id FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;

在 SQLAlchemy 中,这是:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")
_origin_id = select([prior.c.destination_id])\
.where(current.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)\
.label("origin_id")
delivery_origin = select([
UnaryExpression(current.c.delivery_id, operator=custom_op("")).label("delivery_id"),
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)

不幸的是,似乎有一个错误(可能与 this issue 有关)导致 SQLAlchemy 发出错误的连接,因此我们需要应用一个小技巧:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
.label("delivery_id")
# /HACK

_origin_id = select([prior.c.destination_id])\
.where(current.c.parcel_id == prior.c.parcel_id)\
.order_by(prior.c.scheduled_date.desc())\
.limit(1)\
.label("origin_id")
delivery_origin = select([
_delivery_id,
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)

使用带有窗口函数的SELECT作为辅助

可能具有更好性能特征的替代实现是使用窗口函数:

SELECT delivery.*, location.* FROM delivery
LEFT JOIN (
SELECT
delivery.delivery_id,
lag(delivery.delivery_id) OVER (PARTITION BY delivery.parcel_id ORDER BY delivery.scheduled_date) AS origin_id
FROM delivery
) delivery_origin ON delivery.delivery_id = delivery_origin.delivery_id
LEFT JOIN location ON delivery_origin.origin_id = location.location_id;

和以前一样,我们需要应用类似的 hack 来让 SQLAlchemy 生成正确的 SQL:

delivery = Delivery.__table__
location = Location.__table__
current = alias(delivery, "current")
prior = alias(delivery, "prior")

# HACK: wrap delivery_id in an empty unary operator
_delivery_id = UnaryExpression(current.c.delivery_id, operator=custom_op(""))\
.label("delivery_id")
# /HACK

_origin_id = func.lag(current.c.delivery_id)\
.over(partition_by=current.c.parcel_id,
order_by=current.c.scheduled_date)\
.label("origin_id")
delivery_origin = select([
_delivery_id,
_origin_id,
]).select_from(current)
Delivery.origin = relationship(
Location,
primaryjoin=delivery.c.delivery_id == foreign(delivery_origin.c.delivery_id),
secondaryjoin=foreign(delivery_origin.c.origin_id) == location.c.location_id,
secondary=delivery_origin,
viewonly=True,
uselist=False)

关于python - 将关系的主要连接限制为一个,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44851079/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com