gpt4 book ai didi

python - np.percentile Athena SQL 等价物

转载 作者:行者123 更新时间:2023-12-05 05:35:55 29 4
gpt4 key购买 nike

Python 允许我以 1 的步长获取值列表的百分位数 0 到 100,如下所示:

import numpy as np

a = np.array([1,2,3,4,5,6,7,8,9,10])
np.percentile(a,np.arange(0,101,1),interpolation='higher')

结果:

array([ 1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  3,  3,  3,  3,
3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8,
8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])

我正在尝试使用 Athena SQL 实现相同的结果,但我还没有成功。
虽然我尝试过使用 NTILE,但我可能遗漏了一个关键概念:

-- sample data
WITH dataset (val) AS (
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
)

select *,NTILE(101) OVER(ORDER BY val) calculated_ntile
from dataset

有没有办法获得与 Python 结果完全等同的 Athena SQL 结果?

更新:我编写了以下查询:

WITH dataset (val) AS (
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
)

SELECT approx_percentile(val, ARRAY[0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8, 0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99] ) from dataset

输出的几乎是我需要的:

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

我需要修改什么?

最佳答案

由于没有明确定义百分比计算的不同实现方式,我最终编写了一个相当于 Athena 的 approx_percentile 的 python:

def approx_percentile(a,p):
a = np.sort(a)
n = a.shape[0]
rank = (p / 100) * n
return a[0 if math.ceil(rank) == 0 else math.ceil(rank)-1]

关于python - np.percentile Athena SQL 等价物,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73402030/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com