gpt4 book ai didi

sql - 如何在 SQL 中有效地找到多个记录的运行最新更新?

转载 作者:搜寻专家 更新时间:2023-10-30 22:07:25 26 4
gpt4 key购买 nike

考虑以下架构,

-- items which have periodic updates
CREATE TABLE items (
[id] int identity(1, 1) primary key,
[name] varchar(100) not null
);

-- item updates. updating an item generally means it has a new status, at a certain time.
CREATE TABLE updates (
[id] int identity(1, 1) primary key,
[item_id] int foreign key references items([id]),
[new_status] varchar(100) not null,
[update_date] datetime not null
);

用于跟踪一个项目的状态,通过许多状态,随着时间的推移。

我一直在努力寻找能够回答以下问题的高效查询:

For many items, which can be in one of several states, where we log status updates, how many items are currently in each state at the end of each day?

我有一个 SQLFiddle here ,其中有一些示例数据,以及我当前对此查询的尝试。它在一些项目上运行良好,但我的数据库有数十万个项目,因此我的查询目前需要大约 5 分钟才能运行。

是否有更有效的查询来回答这个问题?

测试数据:

-- items which have periodic updates
CREATE TABLE items (
[id] int identity(1, 1) primary key,
[name] varchar(100) not null
);

-- item updates. updating an item generally means it has a new status, at a certain time.
CREATE TABLE updates (
[id] int identity(1, 1) primary key,
[item_id] int foreign key references items([id]),
[new_status] varchar(100) not null,
[update_date] datetime not null
);

-- lets just say that we just created 3 new items
INSERT INTO items (name)
VALUES ('item1'), ('item2'), ('item3');

-- and they all start in the new state
INSERT INTO updates (item_id, new_status, update_date)
SELECT
[id],
[new_status] = 'new',
[update_date] = '2017-10-9 00:00:00.000'
FROM items

-- then we have them update over the course of a couple days
-- item 1
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000'
FROM items WHERE [name] = 'item1'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-12 00:00:00.000'
FROM items WHERE [name] = 'item1'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-14 00:00:00.000'
FROM items WHERE [name] = 'item1';

-- item 2
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-10 00:00:00.000'
FROM items WHERE [name] = 'item2'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-11 00:00:00.000'
FROM items WHERE [name] = 'item2'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-12 00:00:00.000'
FROM items WHERE [name] = 'item2';

-- item 3
INSERT INTO updates (item_id, new_status, update_date)
SELECT [id], [new_status] = 'in progress', [update_date] = '2017-10-11 00:00:00.000'
FROM items WHERE [name] = 'item3'
UNION
SELECT [id], [new_status] = 'ready', [update_date] = '2017-10-13 00:00:00.000'
FROM items WHERE [name] = 'item3'
UNION
SELECT [id], [new_status] = 'complete', [update_date] = '2017-10-15 00:00:00.000'
FROM items WHERE [name] = 'item3';

当前查询:

-- =======================
-- Running latest record
-- =======================
-- Goal: For a period of time, with multiple items, which have multiple updates,
-- find the number of items which are in each state at the end of a day.
--
-- Issue: how can i improve this query for a large database?
--

SELECT
dates.[update_date],
state = latest_update.[new_status],
volume = COUNT(*)
FROM items i -- start with the items that we want to count per day
CROSS JOIN (
SELECT DISTINCT [update_date] FROM updates
) dates -- the days to count for
CROSS APPLY (
-- this cross apply gets all updates for an item, that occurred on or before each date
SELECT
updates.*,
RN = ROW_NUMBER() OVER (PARTITION BY [item_id] ORDER BY [update_date] DESC)
FROM updates
WHERE [update_date] <= dates.[update_date] AND [item_id] = i.[id]
) latest_update
WHERE latest_update.RN = 1 -- only count the latest update
GROUP BY dates.[update_date], latest_update.[new_status]
ORDER BY dates.[update_date], latest_update.[new_status]

[结果]:

|          update_date |       state | volume |
|----------------------|-------------|--------|
| 2017-10-09T00:00:00Z | new | 3 |
| 2017-10-10T00:00:00Z | in progress | 2 |
| 2017-10-10T00:00:00Z | new | 1 |
| 2017-10-11T00:00:00Z | in progress | 2 |
| 2017-10-11T00:00:00Z | ready | 1 |
| 2017-10-12T00:00:00Z | complete | 1 |
| 2017-10-12T00:00:00Z | in progress | 1 |
| 2017-10-12T00:00:00Z | ready | 1 |
| 2017-10-13T00:00:00Z | complete | 1 |
| 2017-10-13T00:00:00Z | ready | 2 |
| 2017-10-14T00:00:00Z | complete | 2 |
| 2017-10-14T00:00:00Z | ready | 1 |
| 2017-10-15T00:00:00Z | complete | 3 |

最佳答案

一种方法是使用条件聚合:

select cast(update_date as date), status, count(*)
from (select u.*,
row_number() over (partition by cast(update_date as date) order by update_date desc) as seqnum
from updates u
) u
where seqnum = 1
group by cast(update_date as date)
order by cast(update_date as date), count(*) desc;

关于sql - 如何在 SQL 中有效地找到多个记录的运行最新更新?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46656712/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com