gpt4 book ai didi

sql - 在 PostgreSQL 中使用 Order By 子句进行分区

转载 作者:行者123 更新时间:2023-11-29 12:07:09 25 4
gpt4 key购买 nike

我有一个包含这些值的表格;

user_id ts                  val
uid1 19.05.2019 01:49:50 0
uid1 19.05.2019 01:50:15 0
uid1 19.05.2019 01:50:20 0
uid1 19.05.2019 01:59:50 1
uid1 19.05.2019 02:20:10 1
uid1 19.05.2019 02:20:15 0
uid1 19.05.2019 02:20:19 0
uid1 19.05.2019 02:30:53 1
uid1 19.05.2019 11:10:25 1
uid1 19.05.2019 11:13:40 0
uid1 19.05.2019 11:13:50 0
uid1 19.05.2019 11:20:19 1
uid2 19.05.2019 15:01:44 0
uid2 19.05.2019 15:05:55 0
uid2 19.05.2019 17:19:35 1
uid2 19.05.2019 17:20:01 0
uid2 19.05.2019 17:20:35 0
uid2 19.05.2019 19:15:50 1

当我只用partition by子句查询这个表时,结果是这样的;

查询:select *, sum(val) over (partition by user_id) as res from example_table;

user_id ts                  val res
uid1 19.05.2019 01:49:50 0 5
uid1 19.05.2019 01:50:15 0 5
uid1 19.05.2019 01:50:20 0 5
uid1 19.05.2019 01:59:50 1 5
uid1 19.05.2019 02:20:10 1 5
uid1 19.05.2019 02:20:15 0 5
uid1 19.05.2019 02:20:19 0 5
uid1 19.05.2019 02:30:53 1 5
uid1 19.05.2019 11:10:25 1 5
uid1 19.05.2019 11:13:40 0 5
uid1 19.05.2019 11:13:50 0 5
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 2
uid2 19.05.2019 15:05:55 0 2
uid2 19.05.2019 17:19:35 1 2
uid2 19.05.2019 17:20:01 0 2
uid2 19.05.2019 17:20:35 0 2
uid2 19.05.2019 19:15:50 1 2

在上面的结果中,res 列具有每个分区的 val 列的总和值。但是,如果我查询带有分区依据和排序依据的表,我会得到这些结果;

查询:select *, sum(val) over (partition by user_id order by ts) as res from example_table;

user_id ts                  val res
uid1 19.05.2019 01:49:50 0 0
uid1 19.05.2019 01:50:15 0 0
uid1 19.05.2019 01:50:20 0 0
uid1 19.05.2019 01:59:50 1 1
uid1 19.05.2019 02:20:10 1 2
uid1 19.05.2019 02:20:15 0 2
uid1 19.05.2019 02:20:19 0 2
uid1 19.05.2019 02:30:53 1 3
uid1 19.05.2019 11:10:25 1 4
uid1 19.05.2019 11:13:40 0 4
uid1 19.05.2019 11:13:50 0 4
uid1 19.05.2019 11:20:19 1 5
uid2 19.05.2019 15:01:44 0 0
uid2 19.05.2019 15:05:55 0 0
uid2 19.05.2019 17:19:35 1 1
uid2 19.05.2019 17:20:01 0 1
uid2 19.05.2019 17:20:35 0 1
uid2 19.05.2019 19:15:50 1 2

但是对于 order by 子句,res 列具有每个分区的每一行的 value 列的累积和。

为什么?我不明白这个。

最佳答案

更新

此行为已记录 here :

4.2.8. Window Function Calls

[..] The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.

这意味着:

在没有 frame_clause 的情况下,默认使用 RANGE UNBOUNDED PRECEDING。这包括:

  • 根据 ORDER BY 子句在当前行“之前”的所有行
  • 当前行
  • ORDER BY 列中与当前行具有相同值的所有行

在没有 ORDER BY 子句的情况下 – 假定为 ORDER BY NULL(尽管我又在猜测)。因此 frame 将包括 partition 中的所有行,因为 ORDER BY 列中的值是相同的(总是NULL) 在每一行中。

原答案:

免责声明:以下是猜测而非合格答案。我没有找到任何可以确认我写的内容的文档。同时,我认为目前给出的答案无法正确解释该行为。

结果差异的原因不直接是 ORDER BY 子句,因为 a + b + cc + b + a 相同。原因是(这是我的猜测)ORDER BY 子句隐含地将 frame_clause 定义为

rows between unbounded preceding and current row

尝试以下查询:

select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and current row
) as res_order_by_unbounded_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between unbounded preceding and current row
) as res_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between current row and unbounded following
) as res_following
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and unbounded following
) as res_orderby_preceding_following

from example_table;

db<>fiddle

您会看到,您可以在不使用 ORDER BY 子句的情况下获得累积总和,也可以在使用 ORDER BY 子句的情况下获得“完整”总和。

关于sql - 在 PostgreSQL 中使用 Order By 子句进行分区,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57639840/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com