gpt4 book ai didi

SQL通过连续递增序列拆分数据,然后按模式对每个数据进行子集化

转载 作者:行者123 更新时间:2023-12-04 23:43:27 26 4
gpt4 key购买 nike

我有试图从中识别模式的数据。但是,每个表中的数据都不完整(缺少行)。我想将表格分成完整数据 block ,然后从每个数据 block 中识别模式。我有一个专栏,我可以用它来确定数据是否完整,称为 sequence .

数据看起来像:

Sequence      Position 
1 open
2 closed
3 open
4 open
5 closed
8 closed
9 open
11 open
13 closed
14 open
15 open
18 closed
19 open
20 closed

首先,我想将数据分成完整的部分:

   Sequence      Position 
1 open
2 closed
3 open
4 open
5 closed
---------------------------
8 closed
9 open
---------------------------
11 open
---------------------------
13 closed
14 open
15 open
---------------------------
18 closed
19 open
20 closed

然后我想识别模式 closed open, ..., open, closed这样我们从关闭到打开 n 行(其中 n 至少为 1)然后回到关闭

根据示例数据,这将留下:

     Sequence        Position 
2 closed
3 open
4 open
5 closed
---------------------------
18 closed
19 open
20 closed

这就留下了我的最终表,我可以在其中进行分析,因为我知道没有中断的序列。我还有另一个专栏,其中 position如果更容易使用,则为二进制。

表格很大,所以尽管我认为我可以编写循环来计算结果,但我认为该方法不够高效。或者我打算将整个表拉入 R , 然后找到结果表,但这需要将所有内容都拉入 R首先,所以我想知道这在 SQL 中是否可行

编辑:更具代表性的不同样本数据:

Sequence      Position 
1 open
2 closed
3 open
4 open
5 closed
8 closed
9 open
11 open
13 closed
14 open
15 open
18 closed
19 open
20 closed
21 closed
22 closed
23 closed
24 open
25 open
26 closed
27 open

请注意,这应该有相同的结果,但也有

    23             closed
24 open
25 open
26 closed

21 , 2227不是因为它们不适合 closed , open ..., open , closed图案

但是如果我们有 28 closed我们想要 2728因为没有时间间隔,而且模式会适合。如果不是 28它是 29 closed我们不想要 2729 (因为尽管模式是正确的,但序列中断了)。

To add some context, think of a machine that goes from stop, to running, to stopped. We record the data, but have gaps in the recording which here are represented by the breaking of the sequences. As well as missing data in the middle of the stop running stop cycle; the data also sometimes starts recording when the machine is already running or stops recording before the machine stops. I don't want that data as it is not a complete cycle of stop, running, stop. I only want those complete cycles, and where the sequence was continuous. This means I can transform my original data set into one with only complete cycles one after the other.

最佳答案

你可以使用它。

DECLARE @MyTable TABLE (Sequence INT, Position VARCHAR(10))

INSERT INTO @MyTable
VALUES
(1,'open'),
(2,'closed') ,
(3,'open'),
(4,'open'),
(5,'closed'),
(8,'closed'),
(9,'open'),
(11,'open'),
(13,'closed'),
(14,'open') ,
(15,'open'),
(18,'closed'),
(19,'open'),
(20,'closed'),
(21,'closed'),
(22,'closed'),
(23,'closed'),
(24,'open'),
(25,'open'),
(26,'closed'),
(27,'open')


;WITH CTE AS(
SELECT * ,
CASE WHEN Position ='closed' AND LAG(Position) OVER(ORDER BY [Sequence]) ='closed' THEN 1 ELSE 0 END CloseMark
FROM @MyTable
)
,CTE_2 AS
(
SELECT
[New_Sequence] = [Sequence] + (SUM(CloseMark) OVER(ORDER BY [Sequence] ROWS UNBOUNDED PRECEDING ))
, [Sequence]
, Position
FROM CTE
)
,CTE_3 AS (
SELECT *,
RN = ROW_NUMBER() OVER(ORDER BY [New_Sequence])
FROM CTE_2
)
,CTE_4 AS
(
SELECT ([New_Sequence] - RN) G
, MIN(CASE WHEN Position = 'closed' THEN [Sequence] END) MinCloseSq
, MAX(CASE WHEN Position = 'closed' THEN [Sequence] END) MaxCloseSq
FROM CTE_3
GROUP BY ([New_Sequence] - RN)
)
SELECT
CTE.Sequence, CTE.Position
FROM CTE_4
INNER JOIN CTE ON (CTE.Sequence BETWEEN CTE_4.MinCloseSq AND CTE_4.MaxCloseSq)
WHERE
CTE_4.MaxCloseSq > CTE_4.MinCloseSq
AND (CTE_4.MaxCloseSq IS NOT NULL AND CTE_4.MinCloseSq IS NOT NULL)

结果:

Sequence    Position
----------- ----------
2 closed
3 open
4 open
5 closed
--- ---
18 closed
19 open
20 closed
--- ---
23 closed
24 open
25 open
26 closed

关于SQL通过连续递增序列拆分数据,然后按模式对每个数据进行子集化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46584392/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com