gpt4 book ai didi

google-bigquery - 将 bigquery json 字符串转换为列

转载 作者:行者123 更新时间:2023-12-05 03:58:47 25 4
gpt4 key购买 nike

对于以字符串形式出现的 json 数据,我希望有类似 JSON_EXTRACT_SCALAR 的东西,但对于灵活数量的结果列。

这里是示例数据——不同的行可以有不同的列名,json可以嵌套:

WITH `my_table` AS (
SELECT '{"sku_types":"{\"id\":\"5433306\",\"product_code\":\"adfklj_ewkj\"}","additional_info":"Face 30 ml","stock_level":"20+"}' as json_string
union all
SELECT '{"additional_info":"Face 100 ml","offer_info":"30%"}' as json_string
)
SELECT *
from my_table;

我想将此数据提取到单独的列中:sku_types.id、sku_types.product_code、additional_info、stock_level、offer_info

这可以在 SQL 中完成还是需要 javascript?

我事先不知道 json 字段的名称,所以我无法使用 JSON_EXTRACT_SCALARJSON_EXTRACT 执行此操作。

最佳答案

以下是 BigQuery 标准 SQL 的示例

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
var z = new Array();
processKey(JSON.parse(y), '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {
z.push(parent + '.' + key + ':' + value)
} else {
z.push(key + ':' + value)
}
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
return z
""";
WITH `my_table` AS (
SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL
SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string
)
SELECT id,
ARRAY(
SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
FROM UNNEST(parseJson(json_string)) kv
) params
FROM my_table

结果

Row id  params.key              params.value     
1 1 sku_types.id 5433306
sku_types.product_code adfklj_ewkj
additional_info Face 30 ml
stock_level 20+
2 2 additional_info Face 100 ml
offer_info 30%

正如您所看到的,而不是将所有可能的属性解析到单独的列中(这在这里是不可能的 - 除非您事先知道它们) - 上面的方法将它们扁平化为参数数组中的键:值对

注意:在上面的示例中,我使用 : 来构造键值对,然后拆分它们。如果您希望值具有此字符 - 您可以调整代码而不是 : 使用更独特的东西 - 例如 ::::::::

Quick update to address comment:
... problem is that some of json values are null, in which case it throws and error

#standardSQL
CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
var z = new Array();
processKey(JSON.parse(y), '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
if (!node[key]) {
value = 'n/a'
} else {
value = node[key].toString();
}
if (value !== '[object Object]') {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {
z.push(parent + '.' + key + ':' + value)
} else {
z.push(key + ':' + value)
}
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
return z
""";
WITH `my_table` AS (
SELECT 1 id, '{"sku_types":{"id":"5433306","product_code":"adfklj_ewkj"},"additional_info":"Face 30 ml","stock_level":"20+"}' AS json_string UNION ALL
SELECT 2, '{"additional_info":"Face 100 ml","offer_info":"30%"}' AS json_string union all
SELECT 3 as id , '{"offer_info":"30%", "price":null}' AS json_string
)
SELECT id,
ARRAY(
SELECT AS STRUCT SPLIT(kv, ':')[OFFSET(0)] key, SPLIT(kv, ':')[SAFE_OFFSET(1)] value
FROM UNNEST(parseJson(json_string)) kv
) params
FROM my_table

结果

Row id  params.key              params.value     
1 1 sku_types.id 5433306
sku_types.product_code adfklj_ewkj
additional_info Face 30 ml
stock_level 20+
2 2 additional_info Face 100 ml
offer_info 30%
3 3 offer_info 30%
price n/a

如您所见 - 我正在用 'n/a' 替换空值,但您可以应用任何您想要的逻辑

关于google-bigquery - 将 bigquery json 字符串转换为列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57728144/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com