gpt4 book ai didi

mysql - 从具有日期范围和多个联接的查询生成时间列表

转载 作者:行者123 更新时间:2023-11-30 00:47:14 27 4
gpt4 key购买 nike

努力理解复杂的 SQL 查询。

这是一个带有表/数据的sqlfiddle http://sqlfiddle.com/#!2/7de65

如果我解释一下这些表在做什么,可能会更有意义;

schedules 是火车时刻表列表,calling 是该时刻表的调用点列表,在火车经过时按顺序排列,激活在确认火车即将运行时创建1并且当火车经过指定的调用点时,就会创建一个运动。

调用通过calling.sid 与日程表关联。激活通过activations.sid 与计划关联。运动通过movement.activation与激活相关联,并通过movement.calling_id与调用相关联。

现在是实际问题;

我想生成每分钟活跃列车的列表。如果满足以下条件,则列车被视为处于事件状态

  • 它至少有 1 个与其激活相关的运动(即尚未离开其原点)
  • 它没有与其最终调用点相关的运动
  • 激活时间不到 24 小时

如果满足所有这些条件,则列车应始终被视为处于事件状态,因此列在计数中。

根据上面sqlfiddle中的数据,火车在14:20离开第一个停靠点,并在15:04到达最后一个停靠点,它应该包含在14:20-15的每分钟计数中:04。我想知道是否有人可以阐明如何做到这一点。我不会认为自己是一个 SQL 专家(这可能是我挣扎的原因,我实际上不会认为自己有能力,但这是一个不同的问题,或者可能是相同的,我不确定)。

我已经开始沿着这条线走下去了

SELECT
YEAR( activations.activated ),
MONTH( activations.activated ),
DAY( activations.activated ),
HOUR( activations.activated ),
MINUTE( activations.activated ),
count(activations.id)
FROM activations, movement, calling, schedules
WHERE activations.id = movement.activation AND movement.calling_id = calling.id AND schedules.id = activations.sid
GROUP BY DAYOFYEAR( activations.activated ) , HOUR( activations.activated ), MINUTE(activations.activated )

但我知道这是错误的,因为无论激活多久,火车都只会列出一次。

我还考虑过直接在 Python 中执行此操作,在指定时间段的每一分钟循环一次,它的工作原理与此类似,但速度非常慢(在 24 小时内以分钟分辨率获取事件列车会导致 1440 次查询,没有完全优化)。所以我认为以太必须是某种巧妙的分组,或者是 SQL 中的某种循环,但我不知道如何做以太。

因此,如果我运行从 14:18 到 15:07 的查询,我会得到类似的结果

+-----------------+------------------+
| Timestamp | Active services |
+-----------------+------------------+
| 14:18 1/1/2014 | 0 |
| 14:19 1/1/2014 | 0 |
| 14:20 1/1/2014 | 1 |
| 14:21 1/1/2014 | 1 |
| 14:22 1/1/2014 | 1 |
[...
Identical record for every minute through to
...]
| 15:03 1/1/2014 | 1 |
| 15:04 1/1/2014 | 1 |
| 15:05 1/1/2014 | 0 |
| 15:06 1/1/2014 | 0 |
| 15:07 1/1/2014 | 0 |
+-----------------+------------------+

(时间戳的格式并不重要,只要我稍后可以解析它即可)

在我的脑海中,我可以看到它是这样工作的(伪代码)

while time is between report_start_date and report_end_date:
records = count(
activations where number of movements(
movement.actual < time
) > 0 //Number of movements created before current minute
and
movement.calling_id = calling_points(
actual < minute
).last.id does not exist //As of this minute doesn't have a movement for last calling point
and
activations.activated > now - 24 hours //Was activated less than 24 hours ago
)
result timestamp, records
time + 1 minute

我几乎已经对记录 = count() 位进行了排序,这只是以太循环或按我不确定的时间分组。我可以按第一乐章记录的日期进行分组,但该记录将仅显示第一分钟。我希望它在事件的每一分钟都显示出来。

<小时/>

奖励积分

我实际上正在尝试在 SQLAlchemy 中实现这一点(因此是标签),我试图在将其移入 SQLAlchemy 查询之前先了解 SQL 中的基础知识,但如果您可以在 SQL 中实现它 SQLAlchemy/Python 你会得到一些东西,我还不太确定是什么,它可能是假设的。

<小时/>
  1. 在任何真正了解这件事的人批评我之前,激活并不能确认火车会运行,但对于我当前的目的来说已经足够接近了。我的最终查询将排除取消和其他内容,但我只是想先了解基础知识。

最佳答案

为了为每个可能的分钟生成一些结果,我不会依赖每个可能的分钟都是数据库中某个表中的值这一事实。因此,我实际上会在数据库中创建一个“静态”表,该表仅存储这些时间戳,并且我们将从那里开始构造一个查询。我已完成以下操作:

CREATE TABLE "static_time" (
"yyyymmddhhmm" datetime NOT NULL,
PRIMARY KEY ("yyyymmddhhmm")
);

注意:对于所有测试,我都使用 sqlite 数据库,因此您可能需要在某些地方进行更改以使用相应的 mysql 构造。

我还添加了2天内的所有数据进行测试。从您想要运行第一次分析到 future 某个重要年份(例如:2050-12-31T23:59:00),您可能应该执行相同的操作。我使用 sqlalchemy 完成了此操作,但我确信直接使用某些函数或循环来完成此操作是有意义的:

class StaticTime(Base):
__tablename__ = 'static_time'
__table_args__ = ({'autoload': True, },)

# ...

def populate_static_time():
print "Adding static times"
sdt = datetime(2014, 1, 1)
edt = sdt + timedelta(days=2)
cdt = sdt
while cdt <= edt:
session.add(StaticTime(yyyymmddhhmm = cdt))
cdt += timedelta(minutes=1)
session.commit()
populate_static_time()

此外,我假设您的 SA 模型包括定义的关系,如下所示:

# MODEL
class Schedule(Base):
__tablename__ = 'schedules'
__table_args__ = ({'autoload': True, },)


class Calling(Base):
__tablename__ = 'calling'
__table_args__ = ({'autoload': True, },)


class Activation(Base):
__tablename__ = 'activations'
__table_args__ = ({'autoload': True, },)

# relationships:
schedule = relationship("Schedule")


class Movement(Base):
__tablename__ = 'movement'
__table_args__ = ({'autoload': True, },)

# relationships:
# @note: use activation_rel as activation is column name
activation_rel = relationship("Activation", backref="movements")

现在,让我们构建查询:

# 0. start with all times and proper counting (columns in SELECT)
q = session.query(
StaticTime.yyyymmddhhmm.label("yyyymmddhhmm"),
func.count(Activation.id.distinct()).label("count"),
)

# 1. join on the trains which are active (or finished, which will be excluded later)
q = q.filter(Activation.movements.any(Movement.actual < StaticTime.yyyymmddhhmm))

# 2. join on the trains which are not finished (or no rows for those that did not)
# 2.a) subquery to get the "last" calling per sid
last_calling_sqry = (session.query(
Calling.sid.label("sid"),
func.max(Calling.id).label("max_calling_id"),
)
.group_by(Calling.sid)
).subquery("xxx")

# 2.b) subquery to find the movement for the "last" colling
train_done_at_sqry = (session.query(
Activation.id.label("activation_id"),
Movement.actual.label("arrived_time"),
)
.join(last_calling_sqry, Activation.sid == last_calling_sqry.c.sid)
.join(Movement, and_(
Movement.calling_id == last_calling_sqry.c.max_calling_id,
Movement.activation == Activation.id,
))
).subquery("yyy")

# 2.c) lets use it now
q = q.outerjoin(train_done_at_sqry,
train_done_at_sqry.c.activation_id == Activation.id,
)
# 2.d) only those that arrived "after" currently tested time
q = q.filter(train_done_at_sqry.c.arrived_time >= StaticTime.yyyymmddhhmm)


# 3. add filter to use only those trains that started in last 24 hours
# @note: do not need this in case when step-X is used as well as it filters
# @TODO: replace func.date(...) with MYSQL version
q = q.filter(Activation.activated >= func.date("now", "-1 days"))

# 4. filter and group by
q = q.group_by(StaticTime.yyyymmddhhmm)
q = q.order_by(StaticTime.yyyymmddhhmm)

# @NOTE: at this point "q" will return only those minutes which have at least 1 active train

# X. FINALLY: WRAP AGAIN TO HAVE ALL MINUTES (also those with no active trains)
sub = q.subquery("sub")
w = session.query(
StaticTime.yyyymmddhhmm.label("Timestamp"),
func.ifnull(sub.c.count, 0).label("Active Services")
)
w = w.outerjoin(sub, sub.c.yyyymmddhhmm == StaticTime.yyyymmddhhmm)
# @TODO: replace func.date(...) with MYSQL version
w = w.filter(Activation.activated >= func.date("now", "-1 days"))

for a in w:
print a

这是一个相当复杂的查询,并且仅给出您提供的数据,很难测试不同的场景。但希望您能够与当前的结果进行比较,并且代码将为您提供一些关于如何完成此操作的提示。另外,我可能在某些地方加入了错误的列(实际计划)。再说一次,这可能在 mysql 上不起作用(我没有它并且不太了解它)。

奖励(反向):w 查询 sqlite 生成的 SQL 语句。您可能会发现从原始 SQL 开始并逐渐转向 sqlalchemy 更容易。

SELECT static_time.yyyymmddhhmm AS "Timestamp", ifnull(sub.count, ?) AS "Active Services"
FROM static_time
LEFT OUTER JOIN (
SELECT static_time.yyyymmddhhmm AS yyyymmddhhmm, count(DISTINCT activations.id) AS count
FROM activations, static_time
LEFT OUTER JOIN (
SELECT activations.id AS activation_id, movement.actual AS arrived_time
FROM activations
JOIN (
SELECT calling.sid AS sid, max(calling.id) AS max_calling_id
FROM calling
GROUP BY calling.sid
) AS xxx
ON activations.sid = xxx.sid
JOIN movement
ON movement.calling_id = xxx.max_calling_id AND movement.activation = activations.id
) AS yyy
ON yyy.activation_id = activations.id
WHERE (EXISTS (SELECT 1
FROM movement
WHERE activations.id = movement.activation AND movement.actual < static_time.yyyymmddhhmm)
)
AND yyy.arrived_time >= static_time.yyyymmddhhmm
GROUP BY static_time.yyyymmddhhmm
ORDER BY static_time.yyyymmddhhmm
) AS sub
ON sub.yyyymmddhhmm = static_time.yyyymmddhhmm
WHERE static_time.yyyymmddhhmm >= ? AND static_time.yyyymmddhhmm <= ?

PARAMS: (0, '2014-01-01 14:15:00.000000', '2014-01-01 15:10:00.000000')

关于mysql - 从具有日期范围和多个联接的查询生成时间列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21262955/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com