gpt4 book ai didi

Python Postgres 将列设置为百分位数

转载 作者:行者123 更新时间:2023-11-29 13:14:06 27 4
gpt4 key购买 nike

我有一个交易表,我想添加一个百分比列,根据金额列指定该月交易的百分比。

以四分位数而不是百分位数为例:

示例输入:

id | month | amount
1 | 1 | 1
2 | 1 | 2
3 | 1 | 5
4 | 1 | 3
5 | 2 | 1
6 | 2 | 3
1 | 2 | 5
1 | 2 | 7
1 | 2 | 9
1 | 2 | 11
1 | 2 | 15
1 | 2 | 16

示例输出

id | month | amount |  quartile
1 | 1 | 1 | 25
2 | 1 | 2 | 50
3 | 1 | 5 | 100
4 | 1 | 3 | 75
5 | 2 | 1 | 25
6 | 2 | 3 | 25
1 | 2 | 5 | 50
1 | 2 | 15 | 100
1 | 2 | 9 | 75
1 | 2 | 11 | 75
1 | 2 | 7 | 50
1 | 2 | 16 | 100

目前,我使用 postgres 的 percentile_cont 函数来确定不同百分位数的截止点的数量值,然后通过并相应地更新百分位数列。不幸的是,这种方法太慢了,因为我有很多不同的月份。关于如何更快地执行此操作的任何想法,最好将百分位数的计算和更新结合在一个 SQL 语句中。

我的代码:

num_buckets = 10

for i in range(num_buckets):
decimal_percentile = (i+1)*(1.0/num_buckets)
prev_decimal_percentile = i*1.0/num_buckets
percentile = int(decimal_percentile*100)
cursor.execute("SELECT month,
percentile_cont(%s) WITHIN GROUP (ORDER BY amount ASC),
percentile_cont(%s) WITHIN GROUP (ORDER BY amount ASC)
FROM transactions GROUP BY month;",
(prev_decimal_percentile, decimal_percentile))
iter_cursor = connection.cursor()
for data in cursor:
iter_cursor.execute("UPDATE transactions SET percentile=%s
WHERE month = %s
AND amount >= %s AND amount <= %s;",
(percentile, data[0], data[1], data[2]))

最佳答案

您可以在单个查询中执行此操作,例如 4 个存储桶:

update transactions t
set percentile = calc_percentile
from (
select distinct on (month, amount)
id,
month,
amount,
calc_percentile
from transactions
join (
select
bucket,
month as calc_month,
percentile_cont(bucket*1.0/4) within group (order by amount asc) as calc_amount,
bucket*100/4 as calc_percentile
from transactions
cross join generate_series(1, 4) bucket
group by month, bucket
) s on month = calc_month and amount <= calc_amount
order by month, amount, calc_percentile
) s
where t.month = s.month and t.amount = s.amount;

结果:

select *
from transactions
order by month, amount;

id | month | amount | percentile
----+-------+--------+------------
1 | 1 | 1 | 25
2 | 1 | 2 | 50
4 | 1 | 3 | 75
3 | 1 | 5 | 100
5 | 2 | 1 | 25
6 | 2 | 3 | 25
1 | 2 | 5 | 50
1 | 2 | 7 | 50
1 | 2 | 9 | 75
1 | 2 | 11 | 75
1 | 2 | 15 | 100
1 | 2 | 16 | 100
(12 rows)

顺便说一句,id 应该是一个主键,然后它可以用于连接以获得更好的性能。

DbFiddle.

关于Python Postgres 将列设置为百分位数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51319459/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com