gpt4 book ai didi

google-bigquery - 如何在 BigQuery 中取消嵌套多个数组?

转载 作者:行者123 更新时间:2023-12-03 21:23:31 24 4
gpt4 key购买 nike

我有这个 json 存储在 BigQuery 表中的 3 个字段标记、问题、答案

token :STRING,问题:STRING,答案:STRING

问答是STRING因为它们是动态字段。

token 字段具有单一值。

问题 字段有dictionary具有“字段”的对象是 list对象并有 3 个问题。

答案 字段是 list对象回答 3 个问题和 id将用于将问题与答案匹配。下面是从 bigquery 下载的 JSON

token          questions                                      answers
18e6d8e445 {"fields": [{"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"} [{"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "t", "type": "text"}, {"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "t@t.com", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}]
949b2c57e3 {"fields": [{"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}, {"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"} [{"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "someone@mail.com", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}, {"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "they were awesome", "type": "text"}]
146c49cdd6 {"fields": [{"id": "CxhfK22a3XWE", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "oUZxPRaKjmFr", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "zUIP73oXpLD6", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "kaiAsx", "title": "a - b"} [{"field": {"id": "CxhfK22a3XWE", "type": "short_text"}, "text": "nice", "type": "text"}, {"field": {"id": "oUZxPRaKjmFr", "type": "opinion_scale"}, "number": 2, "type": "number"}, {"email": "foo@bar.com", "field": {"id": "zUIP73oXpLD6", "type": "email"}, "type": "email"}]

@mikhail-berlyant 在下面提供了这个查询,这让我非常接近我的期望。我遇到的唯一问题是我无法得到答案。
SELECT distinct token, id, title AS question,
JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type
--REPLACE(REGEXP_EXTRACT(b, r'"type":".+?"\s*,\s*".+?":(.+)'), '"', '') answer
FROM `v1-dev-main.typeform.responses`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(definition, '$.fields'), r'"title":"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(definition, '$.fields'), r'"id":"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field": {(.+?)}')) a WITH OFFSET pos3
--UNNEST(REGEXP_EXTRACT_ALL(answers, r'{(.+?),\s*"field":{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2
--AND pos3 = pos4
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id')

这是上面查询的结果
token                       id             question       answer_type
146c43c81cd5780839d3cdd6 zUIP73oXpLD6 Your email address email
146c493c1cd5780839d3cdd6 oUZxPRaKjmFr How useful is this feature turning out to be for you? opinion_scale
146c493c05d5780839d3cdd6 CxhfK22a3XWE What did you think about the changes? short_text
18e6d8e33df44a1aa451b445 lBzHtCuzHFM4 Your email address email
18e6d8e33df44a1aa451b445 L39FyvUohKDV What did you think about the changes? short_text
18e6d0fa014bfa1aa451b445 krs82KgxHwGb How useful is this feature turning out to be for you? opinion_scale
a63b20df691c9a949b2c57e3 krs82KgxHwGb How useful is this feature turning out to be for you? opinion_scale
a63b20df691c9a949b2c57e3 lBzHtCuzHFM4 Your email address email
a63b258ce0339a949b2c57e3 L39FyvUohKDV What did you think about the changes? short_text

现在,我只是想念答案。

最佳答案

下面的示例适用于 BigQuery 标准 SQL,并根据这些 json 字符串的格式对您的数据进行了一些假设 - 因此它很可能需要对正则表达式进行一些调整。但它适用于以下虚拟数据



#standardSQL
WITH `project.dataset.table` AS (
SELECT 12345 token,
'''{"fields": [
{"id":"1","title":"Question 1?"},
{"id":"2","title":"Questions 2?"},
{"id":"3","title":"Question 3?"}
]}''' questions,
'''[
{"type":"text", "text":"answer 1", "field":{"id":"1", "type":"short_text"}},
{"type":"number", "number":42, "field":{"id":"2", "type":"opinion_scale"}},
{"type":"email", "email":"an_account@example.com", "field":{"id":"3", "type":"email"}}
]''' answers
)
SELECT token, id, title AS question,
JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type,
REPLACE(REGEXP_EXTRACT(b, r'"type":".+?"\s*,\s*".+?":(.+)'), '"', '') answer
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"title":"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"id":"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field":{(.+?)}')) a WITH OFFSET pos3,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'{(.+?),\s*"field":{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2
AND pos3 = pos4
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id')

结果为

Row token   id  question        answer_type     answer   
1 12345 1 Question 1? short_text answer 1
2 12345 2 Questions 2? opinion_scale 42
3 12345 3 Question 3? email an_account@example.com

Update based on below comments



#standardSQL
WITH `project.dataset.table` AS (
SELECT "12345" token, '{"fields": [{"id":"1","title":"Question 1?"},{"id":"2","title":"Questions 2?"},{"id":"3","title":"Question 3?"}]}' questions,'[ {"type":"text", "text":"answer 1", "field":{"id":"1", "type":"short_text"}},{"type":"number", "number":42, "field":{"id":"2", "type":"opinion_scale"}},{"type":"email", "email":"an_account@example.com", "field":{"id":"3", "type":"email"}}]' answers UNION ALL
SELECT "18e6d8e33df440fa014bfa1aa451b445", '{"fields": [{"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}', '[{"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "t", "type": "text"}, {"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "t@t.com", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}]"' UNION ALL
SELECT "a63b258ce03360df691c9a949b2c57e3", '{"fields": [{"id": "krs82KgxHwGb", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "lBzHtCuzHFM4", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}, {"id": "L39FyvUohKDV", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}], "id": "SdzXVn", "title": "Google Shopping 5/4/18"}', '[{"field": {"id": "krs82KgxHwGb", "type": "opinion_scale"}, "number": 10, "type": "number"}, {"email": "someone@mail.com", "field": {"id": "lBzHtCuzHFM4", "type": "email"}, "type": "email"}, {"field": {"id": "L39FyvUohKDV", "type": "short_text"}, "text": "they were awesome", "type": "text"}]"' UNION ALL
SELECT "146c493c051a0a481cd5780839d3cdd6", '{"fields": [{"id": "CxhfK22a3XWE", "properties": {}, "ref": "d8834652-3acf-4541-8354-1e3dcd716667", "title": "What did you think about the changes?", "type": "short_text"}, {"id": "oUZxPRaKjmFr", "properties": {}, "ref": "5b6e6796-635b-4595-9404-e81617d4540b", "title": "How useful is this feature turning out to be for you?", "type": "opinion_scale"}, {"id": "zUIP73oXpLD6", "properties": {}, "ref": "b76be913-19b9-4b8a-b2ac-3fb645a65a5c", "title": "Your email address", "type": "email"}], "id": "kaiAsx", "title": "a - b"}', '[{"field": {"id": "CxhfK22a3XWE", "type": "short_text"}, "text": "nice", "type": "text"}, {"field": {"id": "oUZxPRaKjmFr", "type": "opinion_scale"}, "number": 2, "type": "number"}, {"email": "foo@bar.com", "field": {"id": "zUIP73oXpLD6", "type": "email"}, "type": "email"}]"'
)
SELECT token, id, title AS question,
JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.type') answer_type,
COALESCE(JSON_EXTRACT_SCALAR(b, '$.text'),JSON_EXTRACT_SCALAR(b, '$.number'),JSON_EXTRACT_SCALAR(b, '$.email')) AS answer
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"title":\s*"(.+?)"')) title WITH OFFSET pos1,
UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(questions, '$.fields'), r'"id":\s*"(.+?)"')) id WITH OFFSET pos2,
UNNEST(REGEXP_EXTRACT_ALL(answers, r'"field":\s*{(.+?)}')) a WITH OFFSET pos3,
UNNEST(REGEXP_EXTRACT_ALL(REGEXP_REPLACE(answers, r'"field":\s*{.+?}', '"field": ""'), r'{.+?}')) b WITH OFFSET pos4
WHERE pos1 = pos2
AND pos3 = pos4
AND id = JSON_EXTRACT_SCALAR(CONCAT('{',a,'}'), '$.id')

输出是

Row token                               id              question                                                answer_type     answer   
1 12345 1 Question 1? short_text answer 1
2 12345 2 Questions 2? opinion_scale 42
3 12345 3 Question 3? email an_account@example.com
4 18e6d8e33df440fa014bfa1aa451b445 L39FyvUohKDV What did you think about the changes? short_text t
5 18e6d8e33df440fa014bfa1aa451b445 krs82KgxHwGb How useful is this feature turning out to be for you? opinion_scale 10
6 18e6d8e33df440fa014bfa1aa451b445 lBzHtCuzHFM4 Your email address email t@t.com
7 a63b258ce03360df691c9a949b2c57e3 krs82KgxHwGb How useful is this feature turning out to be for you? opinion_scale 10
8 a63b258ce03360df691c9a949b2c57e3 lBzHtCuzHFM4 Your email address email someone@mail.com
9 a63b258ce03360df691c9a949b2c57e3 L39FyvUohKDV What did you think about the changes? short_text they were awesome
10 146c493c051a0a481cd5780839d3cdd6 CxhfK22a3XWE What did you think about the changes? short_text nice
11 146c493c051a0a481cd5780839d3cdd6 oUZxPRaKjmFr How useful is this feature turning out to be for you? opinion_scale 2
12 146c493c051a0a481cd5780839d3cdd6 zUIP73oXpLD6 Your email address email foo@bar.com

关于google-bigquery - 如何在 BigQuery 中取消嵌套多个数组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50616695/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com