gpt4 book ai didi

sql - 如何对数据集进行密集排名

转载 作者:行者123 更新时间:2023-12-01 22:35:48 24 4
gpt4 key购买 nike

我正在尝试获得密集排名以将数据集分组在一起。在我的表中,我有 ID、GRP_SET、SUB_SET 和 INTERVAL,它们仅表示日期字段。当使用 ID 插入记录时,它们将作为 3 行的 GRP_SET 插入,显示为 SUB_SET。正如您所看到的,当插入发生时,间隔在完成插入集之前可能会略有变化。

这里是一些示例数据,DRANK 列代表我想要获得的排名。

with q as (
select 1 id, 'a' GRP_SET, 1 as SUB_SET, 123 as interval, 1 as DRANK from dual union all
select 1, 'a', 2, 123, 1 from dual union all
select 1, 'a', 3, 124, 1 from dual union all
select 1, 'b', 1, 234, 2 from dual union all
select 1, 'b', 2, 235, 2 from dual union all
select 1, 'b', 3, 235, 2 from dual union all
select 1, 'a', 1, 331, 3 from dual union all
select 1, 'a', 2, 331, 3 from dual union all
select 1, 'a', 3, 331, 3 from dual)

select * from q

示例数据

ID GRP_SET SUBSET INTERVAL DRANK
1 a 1 123 1
1 a 2 123 1
1 a 3 124 1
1 b 1 234 2
1 b 3 235 2
1 b 2 235 2
1 a 1 331 3
1 a 2 331 3
1 a 3 331 3

这是我的查询,它很接近,但我似乎需要类似的东西:

  • 分区依据: ID
  • 分区内排序依据: ID、间隔
  • 在以下情况下更改排名:ID、GRP_SET(更改)
<小时/>
select
id, GRP_SET, SUB_SET, interval,
DENSE_RANK() over (partition by ID order by id, GRP_SET) as DRANK_TEST
from q
Order by
id, interval

最佳答案

使用MODEL子句

瞧,您的需求超出了“普通”SQL 可以轻松表达的范围。但幸运的是,您使用的是 Oracle,它具有 MODEL 子句,这是一个其神秘之处仅在于其强大功能的设备 ( excellent whitepaper here )。你应该写:

SELECT
id, grp_set, sub_set, interval, drank
FROM (
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)
MODEL PARTITION BY (id)
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)
MEASURES (grp_set, sub_set, interval, drank)
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)

Proof on SQLFiddle

说明:

SELECT
id, grp_set, sub_set, interval, drank
FROM (
-- Here, we initialise your "dense rank" to 1
SELECT id, grp_set, sub_set, interval, 1 drank
FROM q
)

-- Then we partition the data set by ID (that's your requirement)
MODEL PARTITION BY (id)

-- We generate row numbers for all columns ordered by interval and sub_set,
-- such that we can then access row numbers in that particular order
DIMENSION BY (row_number() OVER (ORDER BY interval, sub_set) rn)

-- These are the columns that we want to generate from the MODEL clause
MEASURES (grp_set, sub_set, interval, drank)

-- And the rules are simple: Each "dense rank" value is equal to the
-- previous "dense rank" value + 1, if the grp_set value has changed
RULES (
drank[any] = NVL(drank[cv(rn) - 1] +
DECODE(grp_set[cv(rn) - 1], grp_set[cv(rn)], 0, 1), 1)
)

当然,这仅在没有交错事件的情况下才有效,即 123 和 124 之间除了 a 之外没有其他 grp_set

关于sql - 如何对数据集进行密集排名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28055975/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com