gpt4 book ai didi

hadoop - 使用 Hive QL 按时间间隔对时间序列进行采样并计算跳跃

转载 作者:可可西里 更新时间:2023-11-01 14:38:35 26 4
gpt4 key购买 nike

我的表中有时间序列数据。基本上每一行都有一个时间戳和一个值。数据的频率是绝对随机的。

我想以给定的频率对其进行采样,并为每个频率提取有关它的相关信息:最小值、最大值、最后值、变化(相对于先前)、返回(变化/先前)以及更多(计数... )

所以这是我的输入:

08:00:10, 1
08:01:20, 2
08:01:21, 3
08:01:24, 5
08:02:24, 2

我想获得 1 分钟采样的以下结果(ts、min、max、last、change、return):

ts        m  M  L  Chg   Return   
08:01:00, 1, 1, 1, NULL, NULL
08:02:00, 2, 5, 5, 4, 4
08:03:00, 2, 2, 2, -3, -0.25

最佳答案

你可以用这样的东西来做(内联评论):

SELECT
min
, mn
, mx
, l
, l - LAG(l, 1) OVER (ORDER BY min) c
-- This might not be the right calculation. Unsure how -0.25 was derived in question.
, (l - LAG(l, 1) OVER (ORDER BY min)) / (LAG(l, 1) OVER (ORDER BY min)) r
FROM
(
SELECT
min
, MIN(val) mn
, MAX(val) mx
-- We can take MAX here because all l's (last values) for the minute are the same.
, MAX(l) l
FROM
(
SELECT
min
, val
-- The last value of the minute, ordered by the timestamp, using all rows.
, LAST_VALUE(val) OVER (PARTITION BY min ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) l
FROM
(
SELECT
ts
-- Drop the seconds and go back one minute by converting to seconds,
-- subtracting 60, and then going back to a shorter string format.
-- 2000-01-01 is a dummy date just to enable the conversion.
, CONCAT(FROM_UNIXTIME(UNIX_TIMESTAMP(CONCAT("2000-01-01 ", ts), "yyyy-MM-dd HH:mm:ss") + 60, "HH:mm"), ":00") min
, val
FROM
-- As from the question.
21908430_input a
) val_by_min
) val_by_min_with_l
GROUP BY min
) min_with_l_m_M
ORDER BY min
;

结果:

+----------+----+----+---+------+------+
| min | mn | mx | l | c | r |
+----------+----+----+---+------+------+
| 08:01:00 | 1 | 1 | 1 | NULL | NULL |
| 08:02:00 | 2 | 5 | 5 | 4 | 4 |
| 08:03:00 | 2 | 2 | 2 | -3 | -0.6 |
+----------+----+----+---+------+------+

关于hadoop - 使用 Hive QL 按时间间隔对时间序列进行采样并计算跳跃,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21908430/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com