gpt4 book ai didi

sql - 大查询 : Store semi-structured JSON data

转载 作者:行者123 更新时间:2023-12-04 01:13:09 24 4
gpt4 key购买 nike

我的数据可以有不同的 json 键,我想将所有这些数据存储在 bigquery 中,然后再探索可用的字段。

我的结构是这样的:

[
{id: 1111, data: {a:27, b:62, c: 'string'} },
{id: 2222, data: {a:27, c: 'string'} },
{id: 3333, data: {a:27} },
{id: 4444, data: {a:27, b:62, c:'string'} },
]

我想使用 STRUCT 类型,但似乎所有字段都需要声明?

然后我希望能够查询并查看每个键出现的频率,并且基本上使用例如 a 键对所有记录运行查询,就好像它在自己的列中一样。

旁注:此数据来自 URL 查询字符串,也许有人认为最好推送完整的 url 并使用函数运行分析?

最佳答案

如示例中所示,有两种主要的存储半结构化数据的方法:

选项 #1:存储 JSON 字符串

您可以将 data 字段存储为 JSON 字符串,然后使用 JSON_EXTRACT函数提取它能找到的值,对于它找不到的任何值,它将返回 NULL

既然您提到需要对字段进行数学分析,那么让我们对ab 的值做一个简单的SUM:

# Creating an example table using the WITH statement, this would not be needed
# for a real table.
WITH records AS (
SELECT 1111 AS id, "{\"a\":27, \"b\":62, \"c\": \"string\"}" as data
UNION ALL
SELECT 2222 AS id, "{\"a\":27, \"c\": \"string\"}" as data
UNION ALL
SELECT 3333 AS id, "{\"a\":27}" as data
UNION ALL
SELECT 4444 AS id, "{\"a\":27, \"b\":62, \"c\": \"string\"}" as data
)

# Example Query
SELECT SUM(aValue) AS aSum, SUM(bValue) AS bSum FROM (
SELECT id,
CAST(JSON_EXTRACT(data, "$.a") AS INT64) AS aValue, # Extract & cast as an INT
CAST(JSON_EXTRACT(data, "$.b") AS INT64) AS bValue # Extract & cast as an INT
FROM records
)

# results
# Row | aSum | bSum
# 1 | 108 | 124

这种方法有一些优点和缺点:

优点

  • 语法相当简单
  • 不易出错

缺点

  • 存储成本会略高,因为您必须存储所有字符以序列化为 JSON。
  • 查询将比使用纯 native SQL 运行得慢。

选项 #2:重复字段

BigQuery 有 support for repeated fields ,允许您采用您的结构并在 SQL 中本地表达它。

使用相同的示例,下面是我们将如何做到这一点:

## Using a with to create a sample table
WITH records AS (SELECT * FROM UNNEST(ARRAY<STRUCT<id INT64, data ARRAY<STRUCT<key STRING, value STRING>>>>[
(1111, [("a","27"),("b","62"),("c","string")]),
(2222, [("a","27"),("c","string")]),
(3333, [("a","27")]),
(4444, [("a","27"),("b","62"),("c","string")])
])),
## Using another WITH table to take records and unnest them to be joined later
recordsUnnested AS (
SELECT id, key, value
FROM records, UNNEST(records.data) AS keyVals
)

SELECT SUM(aValue) AS aSum, SUM(bValue) AS bSum
FROM (
SELECT R.id, CAST(RA.value AS INT64) AS aValue, CAST(RB.value AS INT64) AS bValue
FROM records R
LEFT JOIN recordsUnnested RA ON R.id = RA.id AND RA.key = "a"
LEFT JOIN recordsUnnested RB ON R.id = RB.id AND RB.key = "b"
)

# results
# Row | aSum | bSum
# 1 | 108 | 124

如您所见,执行类似的操作仍然相当复杂。您还必须存储字符串之类的项目,并在必要时将它们 CAST 为其他值,因为您不能在重复的字段中混合类型。

优点

  • 存储大小将小于 JSON
  • 查询通常会执行得更快。

缺点

  • 语法更复杂,不是那么简单

希望对您有所帮助,祝您好运。

关于sql - 大查询 : Store semi-structured JSON data,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54968020/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com