gpt4 book ai didi

google-bigquery - 使用 BigQuery 进行单热编码(虚拟变量)

转载 作者:行者123 更新时间:2023-12-05 02:51:37 25 4
gpt4 key购买 nike

我想使用 BigQuery 而不是 Pandas 为我的类别创建虚拟变量(单热编码)。我最终会得到大约 200 列,因此我无法手动完成并对其进行硬编码

测试数据集(实际的比这个多很多变量)

WITH table AS (
SELECT 1001 as ID, 'blue' As Color, 'big' AS size UNION ALL
SELECT 1002 as ID, 'yellow' As Color, 'medium' AS size UNION ALL
SELECT 1003 as ID, 'red' As Color, 'small' AS size UNION ALL
SELECT 1004 as ID, 'blue' As Color, 'small' AS size)

SELECT *
FROM table

enter image description here

预期结果:

enter image description here

最佳答案

以下是 BigQuery 标准 SQL

DECLARE Colors, Sizes ARRAY<STRING>;

SET (Colors, Sizes) = (SELECT AS STRUCT ARRAY_AGG(DISTINCT Color), ARRAY_AGG(DISTINCT Size) FROM `project.dataset.table`);

EXECUTE IMMEDIATE '''
CREATE TEMP TABLE result AS -- added line
SELECT ID, ''' || (
SELECT STRING_AGG("COUNTIF(Color = '" || Color || "') AS Color_" || Color ORDER BY Color)
FROM UNNEST(Colors) AS Color
) || (
SELECT ', ' || STRING_AGG("COUNTIF(Size = '" || Size || "') AS Size_" || Size ORDER BY Size)
FROM UNNEST(Sizes) AS Size
) || '''
FROM `project.dataset.table`
GROUP BY ID
ORDER BY ID
'''; -- added `;`

SELECT * FROM result; -- added line

如果应用于您问题中的示例数据 - 输出如下

enter image description here

关于google-bigquery - 使用 BigQuery 进行单热编码(虚拟变量),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63043751/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com