gpt4 book ai didi

statistics - Google BigQuery APPROX_QUANTILES 并获得真正的四分位数

转载 作者:行者123 更新时间:2023-12-04 01:31:37 24 4
gpt4 key购买 nike

根据docs :

Returns the approximate boundaries for a group of expression values, where number represents the number of quantiles to create. This function returns an array of number + 1 elements, where the first element is the approximate minimum and the last element is the approximate maximum.



听起来如果我想要真正的四分位数,我需要使用 APPROX_QUANTILES(values, 4)这将返回 [minvalue, 1st quartile, 2nd quartile, 3rd quartile, maxvalue]
根据 https://en.wikipedia.org/wiki/Quartile , 四分位数集包含 3 个数据点 - 其中没有一个是数据的最小值/最大值。

我的假设正确吗?是 APPROX_QUANTILES(values, 4)将返回真正的四分位数?

最佳答案

作为基线,这是没有任何修改的输出,使用 1 到 100 之间的数字输入:

SELECT APPROX_QUANTILES(x, 4) AS output
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x;
+----------------------------+
| output |
+----------------------------+
| ["1","25","50","75","100"] |
+----------------------------+

输出包括最小值 (1) 和最大值 (100)。如果您只想要四分位数,则需要将它们从数组中删除。为了可读性/可组合性,最好使用临时 SQL UDF 来执行此操作。我在这里使用 INT64对于元素类型,但您可以使用不同的元素类型,或者:
CREATE TEMP FUNCTION StripFirstLast(arr ARRAY<INT64>) AS (
ARRAY(SELECT x FROM UNNEST(arr) AS x WITH OFFSET
WHERE OFFSET BETWEEN 1 AND ARRAY_LENGTH(arr) - 2)
);

SELECT
APPROX_QUANTILES(x, 4) AS output,
StripFirstLast(APPROX_QUANTILES(x, 4)) AS quartiles
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x;
+----------------------------+------------------+
| output | quartiles |
+----------------------------+------------------+
| ["1","25","50","75","100"] | ["25","50","75"] |
+----------------------------+------------------+

您可以看到 quartiles数组仅包含所需的值。

关于statistics - Google BigQuery APPROX_QUANTILES 并获得真正的四分位数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48326809/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com