gpt4 book ai didi

SQL:快速累积频率查询(postgres)

转载 作者:行者123 更新时间:2023-11-29 13:07:22 24 4
gpt4 key购买 nike

我希望从我们的数据库中获取累积频率数据。我创建了一个简单的临时表,其中包含我们看到的所有唯一状态更新计数,以及拥有该数量状态更新的用户数量。

     Table "pg_temp_4.statuses_count_tmp"
Column | Type | Modifiers
----------------+---------+-----------
statuses_count | integer |
frequency | bigint |
Indexes:
"statuses_count_idx" UNIQUE, btree (statuses_count)

我当前的查询是:

select statuses_count, frequency/(select * from total_statuses)::float, (select sum(frequency)/(select * from total_statuses)::float AS percentage from statuses_count_tmp WHERE statuses_count <= SCT.statuses_count) AS cumulative_percent  FROM statuses_count_tmp AS SCT ORDER BY statuses_count DESC;

但这需要相当长的时间,而且查询数量增长得相当快。因此,对于我拥有的 ~50,000 行,我正在查看要读取的 50k 阶乘行。坐在这里看着查询逐渐消失,我希望有一个我还没有完成的更好的解决方案。

希望得到这样的东西:

0       0.26975161      0.26975161
1 0.15306534 0.42281695
2 0.05513516 0.47795211
3 0.03050646 0.50845857
4 0.02064444 0.52910301

最佳答案

假设您有 PostgreSQL 8.4 或更高版本,应该可以用窗口函数解决。我猜 total_statuses 是一个 View 或临时表,类似于 select sum(frequency) from statuses_count_tmp?我在这里把它写成 CTE,它应该让它在语句期间只计算一次结果:

with total_statuses as (select sum(frequency) from statuses_count_tmp)
select statuses_count,
frequency / (select * from total_statuses) as frequency,
sum(frequency) over(order by statuses_count)
/ (select * from total_statuses) as cumulative_frequency
from statuses_count_tmp

如果没有 8.4 的窗口函数,您最好的选择就是简单地迭代处理数据:

create type cumulative_sum_type as ( statuses_count int, frequency numeric, cumulative_frequency numeric );
create or replace function cumulative_sum() returns setof cumulative_sum_type strict stable language plpgsql as $$
declare
running_total bigint := 0;
total bigint;
data_in record;
data_out cumulative_sum_type;
begin
select sum(frequency) into total from statuses_count_tmp;
for data_in in select statuses_count, frequency from statuses_count_tmp order by statuses_count
loop
data_out.statuses_count := data_in.statuses_count;
running_total := running_total + data_in.frequency;
data_out.frequency = data_in.frequency::numeric / total;
data_out.cumulative_frequency = running_total::numeric / total;
return next data_out;
end loop;
end;
$$;
select * from cumulative_sum();

关于SQL:快速累积频率查询(postgres),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4673393/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com