gpt4 book ai didi

google-bigquery - BigQuery 中的 percent_rank() 有条件地仅包含某些行

转载 作者:行者123 更新时间:2023-12-05 03:51:49 27 4
gpt4 key购买 nike

我之前发布了 this related question .此问题的解决方案适用于分析函数 rank(),但它不适用于 percent_rank() 以正确的方式。为了演示,我有以下虚拟表:

with 
table as (
select 'a' as category, 1 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 2 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 3 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 4 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 5 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 6 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 7 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 8 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 9 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 10 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 11 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 12 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 13 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 14 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 15 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 16 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 17 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 18 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 19 as num, 15 as num2, 7 as cutoff union all
select 'a' as category, 20 as num, 5 as num2, 7 as cutoff union all
select 'a' as category, 21 as num, 5 as num2, 7 as cutoff
)
num 列需要

percent_rank()。但是,百分位排名只应考虑 num2 > cutoff 的行。我尝试了以下两种方法来计算百分位数,输出:

select
*,
if(num2 >= cutoff,
percent_rank() over(
partition by category
order by num
), null) as pctile1,
if(num2 >= cutoff,
percent_rank() over(
partition by category
order by if (num2 >= cutoff, num, null) ASC
), null) as pctile2
from table
order by num asc

enter image description here

pctile1pctile2 都不正确。要说明为什么会这样,请查看第 10 行,其中 pctile1 == 0.45pctile2 == 0.60。但是,在合格值中,这应该是一个较低的百分位数。只有 2 个 qualifying 值低于 num == 10(即 1 和 2),而有许多大于 10 的值符合条件 (11 - 19)。 num == 10 的正确百分位数,给定 num2cutoff 值,应该接近 30%,因为 10 是 11 个符合条件的值中第三低的值。

请注意,我不应该过滤表以删除我没有percent_rank()的行,因为我需要保留这些行。

编辑

我不确定如何缩小图像大小,但我目前正在尝试这样做。

最佳答案

我只想简单地选择以下选项

#standardSQL
SELECT *,
PERCENT_RANK() OVER(PARTITION BY category ORDER BY num) AS pctile
FROM table WHERE num2 >= cutoff
UNION ALL
SELECT *, NULL
FROM table WHERE num2 < cutoff
-- ORDER BY num

如果应用于您问题中的示例数据 - 输出是

Row category    num num2    cutoff  pctile  
1 a 1 15 7 0.0
2 a 2 15 7 0.1
3 a 3 5 7 null
4 a 4 5 7 null
5 a 5 5 7 null
6 a 6 5 7 null
7 a 7 5 7 null
8 a 8 5 7 null
9 a 9 5 7 null
10 a 10 15 7 0.2
11 a 11 15 7 0.3
12 a 12 15 7 0.4
13 a 13 15 7 0.5
14 a 14 15 7 0.6
15 a 15 15 7 0.7
16 a 16 15 7 0.8
17 a 17 5 7 null
18 a 18 15 7 0.9
19 a 19 15 7 1.0
20 a 20 5 7 null
21 a 21 5 7 null

在我看来,上面的内容很容易阅读,但下面的内容很可能是您想要的

SELECT *,
IF(num2 >= cutoff,
PERCENT_RANK() OVER(PARTITION BY IF(num2 >= cutoff, category, NULL) ORDER BY num),
NULL) AS pctile
FROM table
-- ORDER BY num

显然,结果和上面一样

关于google-bigquery - BigQuery 中的 percent_rank() 有条件地仅包含某些行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62630152/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com