gpt4 book ai didi

sql - SQL中,去除相邻的重复行并进行时间计算

转载 作者:行者123 更新时间:2023-12-03 20:21:13 28 4
gpt4 key购买 nike

在 Microsoft SQL Server 2012 中,我需要删除 Flow 中相邻的重复行下面的列,只保留第一个(标记为 * 以说明)。之后,我需要取1之间的时间差s 和 0 s 代表所有行并获得总累积时间。

Record Number    Downhole Time      Flow
-------------------------------------------
0 03/27/2013 19:23:48.582 1 *
58 03/27/2013 19:28:12.606 1
137 03/27/2013 19:32:16.070 0 *
143 03/27/2013 19:33:59.070 0
255 03/27/2013 19:40:14.070 0
272 03/29/2013 14:43:55.071 1 *
289 03/29/2013 14:45:44.070 1
293 03/29/2013 14:45:59.071 0 *
294 03/29/2013 14:46:10.070 0

删除相邻记录的结果:
Record Number    Downhole Time      Flow
-------------------------------------------
0 03/27/2013 19:23:48.582 1 *
137 03/27/2013 19:32:16.070 0 *
272 03/29/2013 14:43:55.071 1 *
293 03/29/2013 14:45:59.071 0 *

最终想要的结果
cumulative time difference = 
(03/27/2013 19:32:16.070 - 03/27/2013 19:23:48.582)
+ (03/29/2013 14:45:59.071 - 03/29/2013 14:43:55.071)
+ if there are more rows.

最佳答案

我相信这可以完成您要求的工作:

WITH FlowIntervals AS (
SELECT
FromTime = Min(D.[Downhole Time]),
X.ToTime
FROM
dbo.vLog D
OUTER APPLY (
SELECT TOP 1 ToTime = D2.[Downhole Time]
FROM dbo.vLog D2
WHERE
D.[Downhole Time] < D2.[Downhole Time]
AND D.[Flow] <> D2.[Flow]
ORDER BY D2.[Downhole Time]
) X
WHERE D.Flow = 1
GROUP BY X.ToTime
)
SELECT Sum(DateDiff(ms, FromTime, IsNull(ToTime, GetDate())) / 1000.0)
FROM FlowIntervals
;

此查询适用于 SQL 2005 及更高版本。它将表现不错,但需要 vLog 表的自联接,因此它的性能可能不如使用 LEAD 的解决方案好。或 LAG .

如果您正在寻找绝对最佳的性能,此查询可能会起作用:
WITH Ranks AS (
SELECT
Grp =
Row_Number() OVER (ORDER BY [Downhole Time])
- Row_Number() OVER (PARTITION BY Flow ORDER BY [Downhole Time]),
[Downhole Time],
Flow
FROM dbo.vLog
), Ranges AS (
SELECT
Result = Row_Number() OVER (ORDER BY Min(R.[Downhole Time]), X.Num) / 2,
[Downhole Time] = Min(R.[Downhole Time]),
R.Flow, X.Num
FROM
Ranks R
CROSS JOIN (SELECT 1 UNION ALL SELECT 2) X (Num)
GROUP BY
R.Flow, R.Grp, X.Num
), FlowStates AS (
SELECT
FromTime = Min([Downhole Time]),
ToTime = CASE WHEN Count(*) = 1 THEN NULL ELSE Max([Downhole Time]) END,
Flow = IsNull(Min(CASE WHEN Num = 2 THEN Flow ELSE NULL END), Min(Flow))
FROM Ranges R
WHERE Result > 0
GROUP BY Result
)
SELECT
ElapsedSeconds =
Sum(DateDiff(ms, FromTime, IsNull(ToTime, GetDate())) / 1000.0)
FROM
FlowStates
WHERE
Flow = 1
;

使用您的样本数据,它返回 631.486000 (秒)。如果您只选择 FlowStates 中的行CTE,你得到以下结果:
FromTime                ToTime                  Flow
----------------------- ----------------------- ----
2013-03-27 19:23:48.583 2013-03-27 19:32:16.070 1
2013-03-27 19:32:16.070 2013-03-29 14:43:55.070 0
2013-03-29 14:43:55.070 2013-03-29 14:45:59.070 1
2013-03-29 14:45:59.070 NULL 0

此查询适用于 SQL 2005 及更高版本,并且在性能方面应该与任何其他解决方案相提并论,包括使用 LEAD 的解决方案。或 LAG (这以一种偷偷摸摸的方式模拟)。我不保证它会赢,但它可以做得很好,毕竟可能会赢。

this answer to a similar question有关查询中发生的情况的详细信息。

最后,对于完整的解决方案,这里是 SQL Server 的滞后/领先版本:
WITH StateChanges AS (
SELECT
[Downhole Time],
Flow,
Lag(Flow) OVER (ORDER BY [Downhole Time]) PrevFlow
FROM
dbo.vLog
), Durations AS (
SELECT
[Downhole Time],
Lead([Downhole Time]) OVER (ORDER BY [Downhole Time]) NextTime,
Flow
FROM
StateChanges
WHERE
Flow <> PrevFlow
OR PrevFlow IS NULL
)
SELECT ElapsedTime = Sum(DateDiff(ms, [Downhole Time], NextTime) / 1000.0)
FROM Durations
WHERE Flow = 1
;

此查询需要 SQL Server 2012 或更高版本。它计算状态变化(流是否发生变化?),然后选择流确实发生变化的状态,最后计算流从 0 变为 1(流周期的开始)的持续时间。

我很想知道与其他查询相比,此查询的 I/O 和时间的实际性能结果如何。如果您只看执行计划,这个查询似乎会胜出——但它在实际性能统计数据上可能不是那么明显的赢家。

关于sql - SQL中,去除相邻的重复行并进行时间计算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21921673/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com