gpt4 book ai didi

sql - Hive Window在多个日期范围内的功能

转载 作者:行者123 更新时间:2023-12-02 21:18:02 26 4
gpt4 key购买 nike

我有一张看起来像下面的表格:

TagName | DateTime          | Value

TagName1|2016-07-06 09:49:34|14
TagName1|2016-07-06 09:50:34|15
TagName1|2016-07-06 09:51:34|18
TagName2|2016-07-03 02:13:34|421
TagName2|2016-07-03 03:13:34|422
TagName3|2016-07-01 03:13:34|14

我想要做的是在此表上针对每个TagName(例如总和,加权平均,最新值,计数等)在定义的日期范围内进行多次汇总。

这是我到目前为止所拥有的:
SELECT *
FROM
(
SELECT
t1.TagName,
reflect("java.util.UUID", "randomUUID") as rv_id,
t2.item_id as rs_id,
from_unixtime(unix_timestamp()) as tstamp,
t1.datetime as last_date,
t1.value as last_value,
t1.minimum as minimum,
t1.maximum as maximum,
t1.count as count,
t1.total as total,
t1.average as average,
SUM(t1.weight_value) OVER (PARTITION BY TagName) as weighted_average,
t1.Rank as Rank
FROM
(SELECT
TagName,
value,
datetime,
MIN(value) OVER (PARTITION BY TagName) as minimum,
MAX(value) OVER (PARTITION BY TagName) as maximum,
ROW_NUMBER() OVER (PARTITION BY TagName ORDER BY datetime DESC) as Rank,
SUM(value) OVER (PARTITION BY TagName) as total,
COUNT(value) OVER (PARTITION BY TagName) as count,
AVG(value) OVER (PARTITION BY TagName) as average,
(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime))/
(SUM(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime)) OVER (PARTITION BY TagName)) *
(LAG(value,1) OVER (PARTITION BY TagName ORDER BY datetime)) as weight_value
FROM raw.analog_history_dynamic
WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd')) t1
LEFT JOIN meta.item_meta t2
ON t1.TagName = t2.name) t3
WHERE t3.Rank =1;

在这种情况下,我要查看最近5天
WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd'))

除了5天外,我还有10个其他范围,还有一些我需要计算的范围:
-- 1min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000;

-- 5Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000;

-- 10 Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 600000;

-- 30 Min
WHERE par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 1800000;

-- 1 Month
WHERE par_date > date_format(date_sub(to_date(current_date), 30),'yyyyMMdd');

-- 2 Month
WHERE par_date > date_format(date_sub(to_date(current_date), 60),'yyyyMMdd');

至少我想我希望将它们合并在同一分区下,以便所有<1天的聚合(按日期分区的表)

关于能够在一个查询中组合所有这些计算,而不是在不同的where条件下单独执行每个计算的任何想法或建议。

谢谢

最佳答案

In the select query statement only you could use "case when condition;s" which you have given in where clause eg - 

SELECT *
FROM
(
SELECT
t1.TagName,
reflect("java.util.UUID", "randomUUID") as rv_id,
t2.item_id as rs_id,
from_unixtime(unix_timestamp()) as tstamp,
t1.datetime as last_date,
t1.value as last_value,
t1.flag,
t1.minimum as minimum,
t1.maximum as maximum,
t1.count as count,
t1.total as total,
t1.average as average,
SUM(t1.weight_value) OVER (PARTITION BY TagName) as weighted_average,
t1.Rank as Rank
FROM
(SELECT
TagName,
value,
datetime,
case
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000
then flag_1min
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000
then flag_5min
when .......and so on
end as flag,
MIN(value) OVER (PARTITION BY TagName) as minimum,
MAX(value) OVER (PARTITION BY TagName) as maximum,
ROW_NUMBER() OVER (PARTITION BY TagName ORDER BY datetime DESC) as Rank,
SUM(value) OVER (PARTITION BY TagName) as total,
COUNT(value) OVER (PARTITION BY TagName) as count,
AVG(value) OVER (PARTITION BY TagName) as average,
(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime))/
(SUM(unix_timestamp(datetime) - LAG(unix_timestamp(datetime),1) OVER (PARTITION BY TagName ORDER BY datetime)) OVER (PARTITION BY TagName)) *
(LAG(value,1) OVER (PARTITION BY TagName ORDER BY datetime)) as weight_value
FROM raw.analog_history_dynamic
WHERE par_date > date_format(date_sub(to_date(current_date), 5),'yyyyMMdd')) t1
LEFT JOIN meta.item_meta t2
ON t1.TagName = t2.name
group by TagName,
value,
datetime,
case
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 60000
then flag_1min
when par_date > date_format(date_sub(to_date(current_date), 1),'yyyyMMdd')
and unix_timestamp(datetime) > unix_timestamp(current_timestamp) - 300000
then flag_5min
when .......and so on
end as flag,) t3
WHERE t3.Rank =1;

NOTE: in the above code of yours, you have forgotten to use GROUP BY function since you had aggregate functions

关于sql - Hive Window在多个日期范围内的功能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38226074/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com