gpt4 book ai didi

android - 修改CTE的SQLite查询

转载 作者:太空狗 更新时间:2023-10-29 16:00:37 26 4
gpt4 key购买 nike

我的问题是:

    WITH desc_table(counter, hourly, current_weather_description, current_icons, time_stamp) AS (
Select count(*) AS counter, CASE WHEN strftime('%M', 'now') < '30'
THEN strftime('%H', 'now')
ELSE strftime('%H', time_stamp, '+1 hours') END as hourly,
current_weather_description,
current_icons,
time_stamp
From weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'), current_weather_description
UNION ALL
Select count(*) as counter, hourly - 1, current_weather_description, current_icons, time_stamp
From weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'), current_weather_description
Order By counter desc limit 1
),
avg_temp_table(avg_temp, hour_seg, time_stamp) AS (
select avg(current_temperatures) as avg_temp, CASE WHEN strftime('%M', time_stamp) < '30'
THEN strftime('%H', time_stamp)
ELSE strftime('%H', time_stamp, '+1 hours') END as hour_seg,
time_stamp
from weather_events
group by strftime('%H', time_stamp, '+30 minutes')
order by hour_seg desc
)

Select hourly, current_weather_description
from desc_table
join avg_temp_table
on desc_table.hourly=avg_temp_table.hour_seg

基本上,我有一些天气数据,我将它们按小时间隔(偏移30分钟)进行分组,我想特别计算在该时间间隔内获得特定天气描述(和匹配图标)的次数,并用最高出现次数(计数)(描述表)。然后我想得到那个时间段内的平均温度((avg_temp_table)(也许我需要一个子查询?将这两个查询按小时列连接起来。
我希望我的锚基于查询的时间(现在),并计算出现次数,然后下一个成员每次将减去一个小时,然后转到下一个时间间隔和计数,等等。
示例数据,对于常规数据集{当前温度、当前天气描述、当前图标、时间戳},每个时间段中将有更多的行:
"87"    "Rain"  "rainicon"  "2016-01-20 02:15:08"
"65" "Snow" "snowicon" "2016-01-20 02:39:08"
"49" "Rain" "rainicon" "2016-01-20 03:15:08"
"49" "Rain" "rainicon" "2016-01-20 03:39:08"
"46" "Clear" "clearicon" "2016-01-20 04:15:29"
"46" "Clear" "clearicon" "2016-01-20 04:38:53"
"46" "Cloudy" "cloudyicon" "2016-01-20 05:15:08"
"46" "Clear" "clearicon" "2016-01-20 05:39:08"
"45" "Clear" "clearicon" "2016-01-20 06:14:17"
"45" "Clear" "clearicon" "2016-01-20 06:34:23"
"45" "Clear" "clearicon" "2016-01-20 07:24:54"
"45" "Rain" "rainicon" "2016-01-20 07:44:41"
"43" "Rain" "rainicon" "2016-01-20 08:19:08"
"36" "Clear" "clearicon" "2016-01-20 08:39:08"
"35" "Meatballs" "meatballsicon" "2016-01-20 09:18:08"
"18" "Cloudy" "cloudyicon" "2016-01-20 09:39:08"

输出是时间间隔(avg_temp_table)的平均温度与第一个聚合CTE(desc_table){avg_temp,weather_description,current_icon}的输出之间的连接:
"87"    "Rain"  "rainicon"
"57" "Rain" "rainicon"
"47" "Clear" "clearicon"
"46" "Clear" "clearicon"
"46" "Cloudy" "cloudyicon"
"45" "Clear" "clearicon"
"44" "Rain" "rainicon"
"36" "Clear" "clearicon"
"18" "Cloudy" "cloudyicon"

现在我得到了一个没有这样的列错误,因为我的锚来自weather_events表,递归成员也是。当我将递归成员从更改为desc_表时,得到“递归聚合查询不受支持的错误”。但我不想从desc_表中获取递归成员,我想按小时分段,然后遍历每个小时间隔并获取计数。我猜一开始我做错了锚。

最佳答案

我仍然不确定你的递归CTE是如何选择每小时出现的最高天气描述及其图标的,但这是好的,因为,使用你的口头描述,我想我已经找到了一种不递归的方法。
首先,按小时和说明对结果进行分组,并计算每组中的行数:

SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description

下一步,按小时对上述查询的结果进行分组,并获取每小时的最大事件数:
SELECT
hour,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour

这仍然不完全是您想要的,因为您实际上想要描述和图标与最大计数匹配,而不是计数本身。好吧,这很容易解决–只需添加要选择的列,而不将它们添加到分组依据:
SELECT
hour,
current_weather_description,
current_icons,

MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour

您仍然需要在查询中保留 desc_table以使技巧发挥作用。之所以有效,是因为在sqlite中,当select语句包含单个max或single min调用时,任何既不在group by中也不在aggregated中的选定列的值都将从与所述max或min值匹配的行中获取。SQL的这种非标准扩展在 release notes for SQLite 3.7.11中有说明。
MAX(event_count)来说就这么多了。至于 desc_tablecte,您当前的方法似乎没有什么问题,除了我可能会使用group by表达式作为小时定义,而不是您正在使用的case表达式,以保持一致性,而且 avg_temp_table对结果似乎也是多余的。所以稍微修改一下的CTE看起来是这样的:
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
AVG(current_temperatures) AS avg_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes')

现在您只需要在 time_stamp列上 join the two sets并为最终输出选择相关列:
SELECT
t.avg_temp,
d.current_weather_description,
d.current_icons
FROM
avg_temp_table AS t
INNER JOIN desc_table AS d on t.hour = d.hour
ORDER BY
t.hour

你来了。现在我只想解决一个有关查询结果的问题,即
可以避免连接吗?
虽然解决方案的方法(分别获取描述和平均温度,然后将两个集合连接在一起)非常简单,而且非常有意义,但最好避免连接,同时进行所有计算。这很可能会加快查询速度,因为源代码只会被扫描一次。这能实现吗?
碰巧,是的,它可以。将这两部分结合起来的主要困难在于,描述是分两步获得的,而平均温度的计算是一步操作。简单地将 hour放入第一个cte的嵌套选择(按小时和描述分组)中,然后在外部选择(按小时分组)中对结果执行avg,在数学上与在整个小时组中执行一次avg是不等价的。
相反,您需要记住avg=sum/count。如果您在第一步中得到和和和计数,然后在第二步中得到和和和计数,您只需将第一个外和除以第二个外和即可得到平均值。
下面是新的 AVG(current_temperatures)cte,它被修改为组合查询的两部分(因此它不再是cte而是 complete query),必要的更改用粗体突出显示:
SELECT
SUM(total_temp) / SUM(event_count) AS avg_temp,
current_weather_description,
current_icons,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count,
SUM(current_temperatures) AS total_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
ORDER BY
hour
;

显然, desc_table列对于输出是多余的,而且对于查询所依赖的“每组最大n”方法仍然是至关重要的。就我个人而言,在这种情况下,我不会担心一个多余的列,但如果您有很强的理由将其排除在结果集中,则可以将上面的查询用作派生表(同样是),并让最外层的select pull all columns except max_event_count–例如:
SELECT
avg_temp,
current_weather_description,
current_icons
FROM
(
SELECT
hour,
SUM(total_temp) / SUM(event_count) AS avg_temp,
current_weather_description,
current_icons,
MAX(event_count) AS max_event_count
FROM
(
SELECT
strftime('%H', time_stamp, '+30 minutes') AS hour,
current_weather_description,
current_icons,
COUNT(*) AS event_count,
SUM(current_temperatures) AS total_temp
FROM
weather_events
GROUP BY
strftime('%H', time_stamp, '+30 minutes'),
current_weather_description
) AS s
GROUP BY
hour
) AS s
ORDER BY
hour desc
;

如您所见,中间层select现在也包括 max_event_count,这是在最外层order by所需要的。(我在这里假设顺序对调用应用程序很重要。)
我只想提一下这两种方法的结果之间的一个区别。在第一个例子中, hour给出一个浮点结果。在第二个例子中, AVG(current_temperatures)给你一个整数。由于你的预期结果显示的是整数平均值,我想这不应该是个问题。但是,如果您以后决定要提高平均值的精度,请记住可以用total函数替换 SUM(total_temp) / SUM(event_count)SUM(total_temp)中的sum函数,total函数返回与sum相同的值,但结果总是a SUM(current_temperatures)。在sqlite中,用a real除以a real得到a integer,因此使用total可以得到与第一种方法中的avg相同的结果。

关于android - 修改CTE的SQLite查询,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34822638/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com