gpt4 book ai didi

google-bigquery - 是否可以计算 JSON 列中每个键出现的次数?

转载 作者:行者123 更新时间:2023-12-02 20:56:46 24 4
gpt4 key购买 nike

我有一个 BigQuery 表,其中有一列,其中包含 JSON。

我想输出每个键在列中出现的次数,然后按计数降序排序。与所有键关联的值为1

每个对象都有已知/有限数量的键,但我宁愿不依赖它,以防最大的对象发生变化。

总体上键的数量是已知/有限的,但我不想依赖于在列表发生变化时枚举/更新列表。

例如输入:三行一列,名为“json”

[
{"json": "{'A': 1}"},
{"json": "{'B': 1}"},
{"json": "{'B': 1, 'C': 1}"}
]

例如输出:三行两列,名为“key”和“count”

[
{"key": "B", "count": 2},
{"key": "A", "count": 1},
{"key": "C", "count": 1}
]

考虑到我不想依赖每个对象和整体的有限数量的键,最简单的方法是什么?

最佳答案

below for BigQuery Standard SQL

参见Enabling Standard SQLUser-Defined Functions

CREATE TEMPORARY FUNCTION parseJson(y STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
var z = new Array();
processKey(JSON.parse(y), '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
z.push(key)
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
return z
""";

WITH theTable AS (
SELECT '{"json":{"A":"1"}}' AS json UNION ALL
SELECT '{"json":{"B":"1"}}' AS json UNION ALL
SELECT '{"json":{"B":"1","C":"1"}}' AS json
)
SELECT key, COUNT(1) AS `count`
FROM theTable, UNNEST(parseJson(json)) AS key
GROUP BY key
ORDER BY 2 DESC

输出:

key count    
B 2
A 1
C 1

注意:parseJson UDF 足够通用,可以处理任何 json,因此您可以使用以下输入尝试上面的代码,它仍然可以工作:

WITH theTable AS (
SELECT '{"json":{"A":"1"}}' AS json UNION ALL
SELECT '{"json":{"B":"1"}}' AS json UNION ALL
SELECT '{"json":{"B":"1","C":"1"}}' AS json UNION ALL
SELECT '{"A":"1"}' AS json UNION ALL
SELECT '{"B":"1"}' AS json UNION ALL
SELECT '{"B":"1","C":"1"}' AS json

)

输出:

key count    
B 4
A 2
C 2

Added version for BigQuery Legacy SQL

为了简单地在这里展示和进一步测试 - 我在这里使用旧版 SQL UDF 的内联版本。旧版 SQL 中的内联版本并未得到正式支持 - 因此,如果它适合您 - 您将需要稍微对其进行转换 - 请参阅 BigQuery User-Defined Functions有关 BigQuery 旧版 SQL 中 UDF 的详细信息

SELECT key, COUNT(1) as cnt
FROM JS((
SELECT json FROM
(SELECT '{"json":{"A":"1"}}' AS json),
(SELECT '{"json":{"B":"1"}}' AS json),
(SELECT '{"json":{"B":"1","C":"1"}}' AS json),
(SELECT '{"A":"1"}' AS json),
(SELECT '{"B":"1"}' AS json),
(SELECT '{"B":"1","C":"1"}' AS json)
),
json, // Input columns
"[{name: 'parent', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
processKey(JSON.parse(r.json), '');
function processKey(node, parent) {
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({parent:parent, key:key, value:value});
} else {
if (parent !== '' && parent.substr(parent.length-1) !== '.') {parent += '.'};
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP BY key
ORDER BY cnt DESC

关于google-bigquery - 是否可以计算 JSON 列中每个键出现的次数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39988765/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com