gpt4 book ai didi

json - U-SQL - 从复杂的嵌套 json 文件中提取数据

转载 作者:行者123 更新时间:2023-12-03 03:00:10 25 4
gpt4 key购买 nike

我的json结构如下:

{
"First":"xxxx",
"Country":"XX",
"Loop": {
"Links": [
{
"Url":"xxxx",
"Time":123
}, {
"Url":"xxxx",
"Time":123
}],
"TotalTime":123,
"Date":"2018-04-09T10:29:39.0233082+00:00"
}

我想提取属性

First
Country
Url & Time foreach object in the array
TotalTime
Date

这是我的查询

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@extration =
EXTRACT
jsonString string
FROM @"/storage-api/input.json"
USING Extractors.Tsv(quoting:false);

@cleanUp = SELECT jsonString FROM @extration WHERE (!jsonString.Contains("Part: h" ) AND jsonString!= "465}");

@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(jsonString) AS obj FROM @cleanUp;

@columnized = SELECT
obj["First"] AS first,
obj["Country"] AS country
FROM @jsonify;

OUTPUT @columnized
TO @"/storage-api/outputs/tpe1-output.csv"
USING Outputters.Csv();

但是这个查询只提取前2个属性,我不知道如何查询“Loop”内的嵌套数据

最佳答案

您可以使用 MultiLevelJsonExtractor (注释 here )和 JSON 路径(例如 Loop.Links[*])来完成此操作。 MultiLevelJsonExtractor 有一个很好的功能,如果在基本路径中找不到您的节点,它将递归地检查它,尽管我不确定性能如何在大型 JSON 文档或大型JSON 文档的数量。

试试这个:

DECLARE @input string = "/input/input65.json";

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

@result =
EXTRACT First string,
Country string,
Date DateTime,
Url string,
Time string,
TotalTime int
FROM @input
USING new MultiLevelJsonExtractor("Loop.Links[*]",

false,
"First",
"Country",
"Date",
"Url",
"Time",
"TotalTime"
);


OUTPUT @result
TO "/output/output.csv"
USING Outputters.Csv();

我的结果:

Results

HTH

关于json - U-SQL - 从复杂的嵌套 json 文件中提取数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49751881/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com