gpt4 book ai didi

具有多个 ROW_NUMBER 或 RANK 的 SQL

转载 作者:行者123 更新时间:2023-12-01 01:11:16 25 4
gpt4 key购买 nike

我需要在(例如)Person 和PersonEvents 之间对同一个表进行多次连接。每个人有多个事件(0 个或多个)。我需要创建一个 View ,从他们最近的事件中选择每个人的某些列,以及下一个最近事件中的列。

人员数据:

Id    Name
1 Iain
2 Fred
3 Mary
4 Foo
5 Bar

PersonEvents 数据:
PersonId    DateStarted                ReasonForLeaving
1 2011-03-12 00:00:00.000 sick
1 2013-02-12 00:00:00.000 NULL
1 2012-04-12 00:00:00.000 holiday
2 2011-05-12 00:00:00.000 new baby
2 2013-06-12 00:00:00.000 NULL
2 2012-07-12 00:00:00.000 had enough
3 2011-08-12 00:00:00.000 pregnant
3 2013-09-12 00:00:00.000 NULL
4 2012-10-12 00:00:00.000 NULL

输出样本将是:
Id   Name    MemberSince                ReasonForChange
1 Iain 2011-03-12 00:00:00.000 holiday
4 Foo 2012-10-12 00:00:00.000 NULL
...

“旧方式”使用了 top 1 join 或 sub-select 语句:
SELECT p.*,
(
SELECT TOP 1 DateStarted
FROM PersonEvents e
WHERE e.PersonId = p.Id
ORDER BY DateFoo DESC
) As MemberSince
FROM Person p
....

但是,如果您需要此 Join 中的多个列(例如日期、评论,可能还有更多的 id),那么您需要执行多个子选择语句,这很昂贵。

所以问题是 :如何使用最近和以前事件的行号从连接中获取多列?

最佳答案

我提出的最直接(即可读的 SQL)答案使用 WITH 和 ROW_NUMBER。

首先,创建一个 ROW_NUMBER 查询来对事件进行排序,并为该 PersonId 唯一的每个事件提供一个编号:

SELECT *,
ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
FROM PersonEvents

结果:
PersonId    DateStarted              ReasonForLeaving    EventOrder
1 2013-02-12 00:00:00.000 NULL 1
1 2012-04-12 00:00:00.000 holiday 2
1 2011-03-12 00:00:00.000 sick 3
2 2013-06-12 00:00:00.000 NULL 1
2 2012-07-12 00:00:00.000 had enough 2
2 2011-05-12 00:00:00.000 new baby 3
3 2013-09-12 00:00:00.000 NULL 1
3 2011-08-12 00:00:00.000 pregnant 2
4 2012-10-12 00:00:00.000 NULL 1

现在,每个人的“第一个”事件(在我的情况下是最近的)包含进行更改的日期(现实生活中的例子:这是跨多所学校的学生注册历史数据,包含学校 ID 和许多其他内容)。每个人的“第二个”事件包含前一个事件和离开的原因。要将其添加在一起:
WITH SortedEvents AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY ReasonForLeaving DESC) AS EventOrder
FROM PersonEvents
)
SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
FROM Person p
LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2

它提供了格式良好的输出:
Id          Name   MemberSince              ReasonForChange
1 Iain 2013-02-12 00:00:00.000 holiday
2 Fred 2013-06-12 00:00:00.000 had enough
3 Mary 2013-09-12 00:00:00.000 pregnant
4 Foo 2012-10-12 00:00:00.000 NULL
5 Bar NULL NULL

实际上,您可以从任何行号中选择多列。现实生活中的例子(同样是学生入学历史)选择:
  • 从硕士生表:
  • 学生证
  • 姓名
  • DOB 等
  • 从注册历史表中作为“当前注册”
  • 学校编号
  • 各种注册状态信息
  • 开始日期
  • 从注册历史表中作为“以前的注册”
  • 离职原因

  • 这种方法对于大约 15 万名学生及其各自的历史非常有效。

    用于我的测试的完整 SQL:
    CREATE TABLE Person
    (
    Id INT NOT NULL,
    Name VARCHAR(50)
    )
    GO
    CREATE TABLE PersonEvents
    (
    PersonId INT NOT NULL,
    DateStarted DATETIME NOT NULL,
    ReasonForLeaving VARCHAR(50)
    )
    GO
    INSERT INTO Person
    SELECT 1, 'Iain' UNION ALL
    SELECT 2, 'Fred' UNION ALL
    SELECT 3, 'Mary' UNION ALL
    SELECT 4, 'Foo' UNION ALL
    SELECT 5, 'Bar'
    GO
    INSERT INTO PersonEvents
    SELECT 1, '20110312', 'sick' UNION ALL
    SELECT 1, '20130212', NULL UNION ALL
    SELECT 1, '20120412', 'holiday' UNION ALL
    SELECT 2, '20110512', 'new baby' UNION ALL
    SELECT 2, '20130612', NULL UNION ALL
    SELECT 2, '20120712', 'had enough' UNION ALL
    SELECT 3, '20110812', 'pregnant' UNION ALL
    SELECT 3, '20130912', NULL UNION ALL
    SELECT 4, '20121012', NULL
    GO

    --SELECT *
    --FROM Person
    --SELECT *
    --FROM PersonEvents
    --GO
    WITH SortedEvents AS (
    SELECT *,
    ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY DateStarted DESC) AS EventOrder
    FROM PersonEvents
    )
    SELECT p.*, MostRecent.DateStarted AS MemberSince, NextRecent.ReasonForLeaving AS ReasonForChange
    FROM Person p
    LEFT OUTER JOIN SortedEvents AS MostRecent ON p.Id = MostRecent.PersonId AND MostRecent.EventOrder = 1
    LEFT OUTER JOIN SortedEvents AS NextRecent ON p.Id = NextRecent.PersonId AND NextRecent.EventOrder = 2
    GO

    SELECT p.*,
    (
    SELECT TOP 1 DateStarted
    FROM PersonEvents pe
    WHERE pe.PersonId = p.Id
    ORDER BY DateStarted DESC
    ) AS MemberSince,
    'unknown' AS ReasonForChange
    FROM Person p
    GO

    DROP TABLE Person
    DROP TABLE PersonEvents
    GO

    关于具有多个 ROW_NUMBER 或 RANK 的 SQL,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15351690/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com