gpt4 book ai didi

azure - 如何使用 ADF 删除 Azure Gen 2 数据湖中的文件夹

转载 作者:行者123 更新时间:2023-12-03 06:20:59 26 4
gpt4 key购买 nike

我很难从 adf 管道中删除数据湖中的文件夹。我尝试了与在 ForEach 事件中使用 getmetadata 事件和 delete 事件对文件所做的类似方法,但无法解决问题。

所以每天我都会写一个快照数据,文件路径看起来像这样 -> 快照/表名/文件卸载时的日期 - 这是基本的 GETUTCDATE,格式为 yyyyMMddHHmmss

so路径=快照/表名/yyyyMMddHHmmss/*.parquet

我的目标是删除 tablename 文件夹中超过 5 天的所有文件夹。

感谢任何提示。

最佳答案

My goal is to delete all the folders in the tablename folder which is older than 5 days.

您可以通过使用如下所示的过滤器事件来实现您的要求。

首先创建一个具有 -(n-1) 值的整数参数,其中 n 是天数。然后使用以下表达式在设置变量中以 yyyyMMdd 格式创建最后第 n 天。

@addDays(utcnow(),pipeline().parameters.ndays,'yyyyMMdd')

enter image description here

然后使用“获取元数据”事件获取文件夹列表。

这些是我的文件夹:

enter image description here

现在,将此“子项”数组提供给“筛选器”事件。过滤器事件过滤超过 5 天的文件夹名称。

过滤项目: @activity('获取Metadata1').output.childItems

过滤条件: @greater(int(variables('last_5thday')), int(substring(item().name,0,8)))

过滤器将给出这样的输出数组。

enter image description here

将此数组作为 @activity('Filter1').output.Value 提供给 ForEach,并在 ForEach 内部使用删除事件。

对于删除事件,请使用数据集参数作为文件夹名称,如下所示。

enter image description here

@item().name 作为值提供给 ForEach 内的删除事件中的参数。

enter image description here

您可以看到管道执行后删除了超过 5 天的文件夹。

enter image description here

这是我的 Pipeline JSON 供您引用:

{
"name": "Pipeline 1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "source_folder",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "last_5thday",
"value": {
"value": "@addDays(utcnow(),pipeline().parameters.ndays,'yyyyMMdd')",
"type": "Expression"
}
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"condition": {
"value": "@greater(int(variables('last_5thday')), int(substring(item().name,0,8)))",
"type": "Expression"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@activity('Filter1').output.Value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Delete1",
"type": "Delete",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Binary1",
"type": "DatasetReference",
"parameters": {
"folder": {
"value": "@item().name",
"type": "Expression"
}
}
},
"enableLogging": false,
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
}
}
}
]
}
}
],
"parameters": {
"ndays": {
"type": "int",
"defaultValue": -4
}
},
"variables": {
"last_5thday": {
"type": "String"
},
"test": {
"type": "String"
}
},
"annotations": []
}
}

关于azure - 如何使用 ADF 删除 Azure Gen 2 数据湖中的文件夹,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75908433/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com