gpt4 book ai didi

google-app-engine - 大查询 : How to flatten repeated structured property imported from datastore

转载 作者:太空宇宙 更新时间:2023-11-03 15:28:10 27 4
gpt4 key购买 nike

亲爱的

本月我开始使用 BigQuery 分析 GAE 数据存储中的数据。首先,我通过 GAE 控制台的“Datastore Admin”页面将数据导出到 Google Cloud Storage。然后,我将数据从 Google Cloud Storage 导入到 BigQuery。除了重复的结构化属性外,它工作得非常顺利。我希望导入的记录格式为:

    parent:"James",
children: [{
name: "name1",
age: 5,
gender: "M"
}, {
name: "name2",
age: 50,
gender: "F"
}, {
name: "name3",
age: 33,
gender: "M"
},
]

我知道如何将上述格式的数据展平。但是 BigQuery 中的实际数据格式似乎是以下格式:

    parent: "James",
children.name:["name1", "name2", "name3"],
children.age:[5, 50, 33],
children.gender:["M", "F", "M"],

我想知道是否可以在 BigQuery 中展平上述数据以供进一步分析。我心目中比较理想的结果表格式是:

    parentName, children.name, children.age, children.gender
James, name1, 5, "M"
James, name2, 50, "F"
James, name3, 33, "M"

干杯!

最佳答案

最近推出BigQuery Standard SQL - 事情好多了!
尝试以下操作(确保取消选中 Show Options 下的 Use Legacy SQL 复选框)

WITH parents AS (
SELECT
"James" AS parentName,
STRUCT(
["name1", "name2", "name3"] AS name,
[5, 50, 33] AS age,
["M", "F", "M"] AS gender
) AS children
)
SELECT
parentName, childrenName, childrenAge, childrenGender
FROM
parents,
UNNEST(children.name) AS childrenName WITH OFFSET AS pos_name,
UNNEST(children.age) AS childrenAge WITH OFFSET AS pos_age,
UNNEST(children.gender) AS childrenGender WITH OFFSET AS pos_gender
WHERE
pos_name = pos_age AND pos_name = pos_gender

这里 - 原始表 - parents - 有以下数据

enter image description here

将各自的schema作为

[{
"parentName": "James",
"children": {
"name": ["name1", "name2", "name3"],
"age": ["5", "50", "33" ],
"gender": ["M", "F", "M"]
}
}]

输出

enter image description here

注意:以上内容完全基于我在原始问题中看到的内容,很可能需要根据您的具体需求进行调整
希望这对前进的方向和从哪里开始有所帮助!

Added:

以上查询使用基于行的 CROSS JOINS,这意味着首先组装同一父项的所有变体,然后 WHERE 子句过滤掉“错误”的变体。

相比之下,以下版本,使用 INNER JOIN 来消除这种“副作用”

WITH parents AS (
SELECT
"James" AS parentName,
STRUCT(
["name1", "name2", "name3"] AS name,
[5, 50, 33] AS age,
["M", "F", "M"] AS gender
) AS children
)
SELECT
parentName, childrenName, childrenAge, childrenGender
FROM
parents, UNNEST(children.name) AS childrenName WITH OFFSET AS pos_name
JOIN UNNEST(children.age) AS childrenAge WITH OFFSET AS pos_age
ON pos_name = pos_age
JOIN UNNEST(children.gender) AS childrenGender WITH OFFSET AS pos_gender
ON pos_age = pos_gender

直觉上,我希望第二个版本对于更大的表更有效

关于google-app-engine - 大查询 : How to flatten repeated structured property imported from datastore,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17228281/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com