gpt4 book ai didi

azure - 读取 Azure 数据工厂 DataFlow 中的嵌套数组

转载 作者:行者123 更新时间:2023-12-03 06:17:56 25 4
gpt4 key购买 nike

我有一个从 API 返回的复杂 json 响应,我需要将响应的某些部分存储到 Azure 存储 Blob 中的 CSV 文件中。

经过数据流中的一系列转换,我终于能够过滤需要存储在 CSV 中的数据。数据如下:

"Rows": [
[
"record1",
123,
true,
"test1"
],
[
"record2",
456,
false,
"test2"
]
]

此时我被困在如何处理这种嵌套数组上。我添加了派生列并尝试了许多表达式。但都没有成功。

我需要如下的 csv:

record1   123   true   test1
record2 456 false test2

任何有关如何实现这一目标的建议都会对我有所帮助。

ADX API 的响应供您引用:

[    
{
"FrameType": "DataTable",
"TableId": 1,
"TableKind": "PrimaryResult",
"TableName": "PrimaryResult",
"Columns": [
{
"ColumnName": "RecordID",
"ColumnType": "string"
},
{
"ColumnName": "RecordNumber",
"ColumnType": "integer"
},
{
"ColumnName": "IsValid",
"ColumnType": "boolean"
},
{
"ColumnName": "Remarks",
"ColumnType": "string"
}
],
"Rows": [
[
"record1",
123,
true,
"test1"
],
[
"record2",
456,
false,
"test2"
]
]
}
]

最佳答案

  • 您可以将 Rows 的值从 API 响应传递到数据流并应用转换以获得所需的结果。

  • 由于我无权访问您的 API,因此我已将响应存储在 JSON 文件中并使用查找来访问它。

enter image description here

  • 现在,我已将此值作为字符串传递给我的数据流参数(名为 str)。我已将包含以下数据的文件作为源(任何具有 1 行和 1 列的随机文件)。

enter image description here

  • 现在,使用以下数据流 JSON 中指定的转换。我能够达到您的要求。
{
"name": "dataflow1",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"dataset": {
"referenceName": "DelimitedText2",
"type": "DatasetReference"
},
"name": "sink1"
}
],
"transformations": [
{
"name": "derivedColumn1"
},
{
"name": "derivedColumn2"
},
{
"name": "select1"
}
],
"scriptLines": [
"parameters{",
" str as string ('[[\"record1\",123,true,\"test1\"],[\"record2\",456,false,\"test2\"]]')",
"}",
"source(output(",
" id as integer",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false) ~> source1",
"source1 derive(tp = toString(unfold(split($str, '],[')))) ~> derivedColumn1",
"derivedColumn1 derive(tp = replace(replace(tp,'[[',''),']]','')) ~> derivedColumn2",
"derivedColumn2 select(mapColumn(",
" tp",
" ),",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true) ~> select1",
"select1 sink(allowSchemaDrift: true,",
" validateSchema: false,",
" partitionFileNames:['op.csv'],",
" umask: 0022,",
" preCommands: [],",
" postCommands: [],",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" saveOrder: 1,",
" partitionBy('hash', 1)) ~> sink1"
]
}
}
}
  • 对于 Sink 数据集,我使用了以下配置:

enter image description here

  • 以下是作为引用的结果输出文件:

enter image description here

关于azure - 读取 Azure 数据工厂 DataFlow 中的嵌套数组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76069911/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com