azure - 使用 azure 数据工厂将数据从 azure 追加 blob 提取到 kusto 数据库时出错-6ren

azure - 使用 azure 数据工厂将数据从 azure 追加 blob 提取到 kusto 数据库时出错

转载作者：行者123 更新时间：2023-12-03 06:17:58

我有一个 azure 的附加 blob(sharing.json)，其内容类型为:application/json。我正在尝试使用 azure 数据工厂(ADF)将其摄取到 kusto 数据库中，但摄取总是失败。我在 ADF 的输出中收到以下错误:

"errors": [
        {
            "Code": 23302,
            "Message": "ErrorCode=KustoWriteFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Write to Kusto failed with following error: 'An error occurred for source: 'DataReader'. Error: '''.,Source=Microsoft.DataTransfer.Runtime.KustoConnector,''Type=Kusto.Ingest.Exceptions.IngestClientException,Message=An error occurred for source: 'DataReader'. Error: '',Source=Kusto.Ingest,'",
            "EventType": 0,
            "Category": 5,
            "Data": {},
            "MsgId": null,
            "ExceptionType": null,
            "Source": null,
            "StackTrace": null,
            "InnerEventInfos": []
        }
    ]

尝试从 chatGPT 和其他在线资源获取帮助，但到目前为止还没有成功。

这是我的 ADF 事件配置:

{
    "name": "CopyPipeline_k0h",
    "properties": {
        "activities": [
            {
                "name": "Copy_k0h",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 3,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [
                    {
                        "name": "Source",
                        "value": "sil-xms-load-max-data//sharing.json"
                    },
                    {
                        "name": "Destination",
                        "value": "AggregatedSharingTest_v1"
                    }
                ],
                "typeProperties": {
                    "source": {
                        "type": "JsonSource",
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "JsonReadSettings"
                        }
                    },
                    "sink": {
                        "type": "AzureDataExplorerSink",
                        "ingestionMappingName": "",
                        "additionalProperties": {
                            "tags": "drop-by:loadtest",
                            "format": "multijson"
                        }
                    },
                    "enableStaging": false,
                    "validateDataConsistency": false,
                    "logSettings": {
                        "enableCopyActivityLog": true,
                        "copyActivityLogSettings": {
                            "logLevel": "Info",
                            "enableReliableLogging": true
                        },
                        "logLocationSettings": {
                            "linkedServiceName": {
                                "referenceName": "LoadTestBlob",
                                "type": "LinkedServiceReference"
                            },
                            "path": "debug-logs"
                        }
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "mappings": [
                            {
                                "source": {
                                    "path": "$['deviceId']"
                                },
                                "sink": {
                                    "name": "deviceId",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['tenant']"
                                },
                                "sink": {
                                    "name": "tenant",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['tagsSerialNo']"
                                },
                                "sink": {
                                    "name": "tagsSerialNo",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['metricSum']"
                                },
                                "sink": {
                                    "name": "metricSum",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['metricCount']"
                                },
                                "sink": {
                                    "name": "metricCount",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['notMetricCount']"
                                },
                                "sink": {
                                    "name": "notMetricCount",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['timestamp']"
                                },
                                "sink": {
                                    "name": "timestamp",
                                    "type": "DateTime"
                                }
                            }
                        ],
                        "collectionReference": ""
                    }
                },
                "inputs": [
                    {
                        "referenceName": "SourceDataset_k0h",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DestinationDataset_k0h",
                        "type": "DatasetReference"
                    }
                ]
            }
        ],
        "annotations": [],
        "lastPublishTime": "2023-04-18T11:30:35Z"
    },
    "type": "Microsoft.DataFactory/factories/pipelines"
}

这是 ADF 上的目标数据集配置:

{
    "name": "DestinationDataset_k0h",
    "properties": {
        "linkedServiceName": {
            "referenceName": "LoadTestDump",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "AzureDataExplorerTable",
        "schema": [
            {
                "name": "deviceId",
                "type": "string"
            },
            {
                "name": "tenant",
                "type": "string"
            },
            {
                "name": "tagsSerialNo",
                "type": "string"
            },
            {
                "name": "metricSum",
                "type": "long"
            },
            {
                "name": "metricCount",
                "type": "long"
            },
            {
                "name": "notMetricCount",
                "type": "long"
            },
            {
                "name": "timestamp",
                "type": "datetime"
            }
        ],
        "typeProperties": {
            "table": "AggregatedSharingTest_v1"
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

这是 ADF 上的 Azure Blob 存储配置:

{
    "name": "SourceDataset_k0h",
    "properties": {
        "linkedServiceName": {
            "referenceName": "LoadTestBlob",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "Json",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "fileName": "sharing.json",
                "container": "sil-xms-load-max-data"
            }
        },
        "schema": {
            "type": "object",
            "properties": {
                "deviceId": {
                    "type": "string"
                },
                "tenant": {
                    "type": "string"
                },
                "tagsSerialNo": {
                    "type": "string"
                },
                "metricSum": {
                    "type": "integer"
                },
                "metricCount": {
                    "type": "integer"
                },
                "notMetricCount": {
                    "type": "integer"
                },
                "timestamp": {
                    "type": "string"
                }
            }
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

我已经在 azure 门户上测试了源连接和目标连接，它们看起来不错。不确定到底出了什么问题，因为管道运行并且运行详细信息显示了读取的数据和写入的数据，但数据在 Kusto 表上永远无法用于查询，并最终因上述错误而失败

最佳答案

我尝试使用存储帐户中的输入 JSON 和管道 JSON，但最终出现相同的错误。

enter image description here

就您的情况而言，此错误的原因是复制事件接收器中的additionalProperties。

当我删除additionalProperties后，我能够成功复制数据。

enter image description here

我在 kustos 表中有 4 行数据，在删除附加属性后，您可以看到使用复制事件从源插入了两行。

enter image description here

目标表中的数据:

enter image description here

这是我的 Pipeline JSON 供您引用:

{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "Copy data1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [
                    {
                        "name": "Source",
                        "value": "data//myjson.json"
                    },
                    {
                        "name": "Destination",
                        "value": "table1"
                    }
                ],
                "typeProperties": {
                    "source": {
                        "type": "JsonSource",
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "recursive": true,
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "JsonReadSettings"
                        }
                    },
                    "sink": {
                        "type": "AzureDataExplorerSink",
                        "ingestionMappingName": ""
                    },
                    "enableStaging": false,
                    "logSettings": {
                        "enableCopyActivityLog": true,
                        "copyActivityLogSettings": {
                            "logLevel": "Info",
                            "enableReliableLogging": true
                        },
                        "logLocationSettings": {
                            "linkedServiceName": {
                                "referenceName": "AzureDataLakeStorage2",
                                "type": "LinkedServiceReference"
                            },
                            "path": "data/debug-logs"
                        }
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "mappings": [
                            {
                                "source": {
                                    "path": "$['deviceId']"
                                },
                                "sink": {
                                    "name": "deviceId",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['tenant']"
                                },
                                "sink": {
                                    "name": "tenant",
                                    "type": "Guid"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['tagsSerialNo']"
                                },
                                "sink": {
                                    "name": "tagsSerialNo",
                                    "type": "String"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['metricSum']"
                                },
                                "sink": {
                                    "name": "metricSum",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['metricCount']"
                                },
                                "sink": {
                                    "name": "metricCount",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['notMetricCount']"
                                },
                                "sink": {
                                    "name": "notMetricCount",
                                    "type": "Int64"
                                }
                            },
                            {
                                "source": {
                                    "path": "$['timestamp']"
                                },
                                "sink": {
                                    "name": "timestamp",
                                    "type": "DateTime"
                                }
                            }
                        ],
                        "collectionReference": ""
                    }
                },
                "inputs": [
                    {
                        "referenceName": "Json1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "AzureDataExplorerTable1",
                        "type": "DatasetReference"
                    }
                ]
            }
        ],
        "annotations": []
    }
}

关于azure - 使用 azure 数据工厂将数据从 azure 追加 blob 提取到 kusto 数据库时出错，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76063959/

文章推荐： azure - Azure Functions 上的 Cron 表达式和执行时间

文章推荐： azure - 读取 Azure 数据工厂 DataFlow 中的嵌套数组

python - Azure Blob 存储 - 可以列出 blob，但不能删除 blob
我正在尝试从 Azure 容器中删除 blob。我能够连接到它并列出此问题中代码后面的所有 blob:Upload and Delete Azure Storage Blob using azure-
python - Azure Blob 存储 - 可以列出 blob，但不能删除 blob
我正在尝试从 Azure 容器中删除 blob。我能够连接到它并列出此问题中代码后面的所有 blob:Upload and Delete Azure Storage Blob using azure-
Azure blob 错误 :The specified blob does not exist, 但 Blob 存在
运行我的 azure 函数(用于读取 azure blob 存储)后出现错误。错误是 ID 0dad768d-36d4-4c1a-85ae-2a5122533b3c fail: Func
Azure blob 错误 :The specified blob does not exist, 但 Blob 存在
运行我的 azure 函数(用于读取 azure blob 存储)后出现错误。错误是 ID 0dad768d-36d4-4c1a-85ae-2a5122533b3c fail: Func
c# - Azure Blob 存储 - 上传 Blob 后如何获取 Blob 存储 ID？
我正在使用 C# 控制台应用程序 (.NET Core 3.1) 从 Azure Blob 存储读取大量图像文件并生成这些图像的缩略图。新图像将保存回 Azure，并将 Blob ID 存储在我们的数
c# - 如何使用 Azure.Storage.Blobs BlobClient 检索 Blob 目录路径中的 Blob？
我没有在网上看到任何有关如何获取位于 BlobContainerClient 内特定目录内的所有 blob 的示例。以前，我使用的是 Microsoft.Azure.Storage 软件包，但这些软
c# - Azure Blob 存储 - 上传 Blob 后如何获取 Blob 存储 ID？
我正在使用 C# 控制台应用程序 (.NET Core 3.1) 从 Azure Blob 存储读取大量图像文件并生成这些图像的缩略图。新图像将保存回 Azure，并将 Blob ID 存储在我们的数
c# - 如何使用 Azure.Storage.Blobs BlobClient 检索 Blob 目录路径中的 Blob？
我没有在网上看到任何有关如何获取位于 BlobContainerClient 内特定目录内的所有 blob 的示例。以前，我使用的是 Microsoft.Azure.Storage 软件包，但这些软
javascript - 如何使用 Azure Blob 服务将 Blob 上传到 Azure Blob 存储
我正在编写一些代码，允许用户使用麦克风录制自己的声音，然后将录音上传到 Azure Blob 存储。为了录制音频，我使用类似于下面的代码 let recordedBlobs = []; this.m
azure - Golang azure blob 存储，0b blob 并覆盖下载的 blob 数据
当前使用:https://github.com/Azure/azure-sdk-for-go 概述:我当前正在从 azure blob 存储中下载一个 blob，解析该 blob，然后将转录的 blo
blob - 二进制文件和 BLOB 之间的区别
正在观看 this video about how to design Tinder ，在 06:50 提出了关于文件与 BLOBS 的观点。我想知道大二进制文件和 BLOB(二进制大对象)之间有什
java - 如何创建 blob/blob？
目前我有 hibernate JPA HSQLDB 来自动创建我的数据库表。如何告诉 JPA 或 Hibernate 将字符串保存为 clob/blob 字段？即一个很长的字符串。到目前为止我找不
python - 消除一维阵列中的 Blob / Blob
我有一个一维 NumPy 数组，其中包含一些“坏”值。我想剔除它们。每个坏值的邻居只是“顽皮”，但我也想剔除它们。对不良值的可靠测试是询问: arr<0.1 但是，(我能想到的)对于顽皮值的唯一可
Azure Blob 存储 REST API : Why "Get Blob Properties" and "Get Blob" requests are the same?
查看有关获取 Blob 和获取 Blob 属性的 MSDN 文档。两个请求看起来相同 "https://myaccount.blob.core.windows.net/mycontainer/mybl
azure-blob-storage - 无法通过 SAS 使用 azcopy 从一个 blob 到另一个 blob
我有 2 个 Blob 存储，一个在 eastus，一个在 canadaeast，我想将一个 .vhd 从 eastus 复制到 canadaeast。我去了 eastus，在我想要复制的 blob
azure - 拥有许多小型 Azure 存储 Blob 容器(每个容器都包含一些 Blob)更好，还是拥有一个包含大量 Blob 的大型容器更好？
所以场景如下: 我有多个 Web 服务实例，用于将 blob 数据写入 Azure 存储。我需要能够根据收到的时间将 blob 分组到容器(或虚拟目录)中。偶尔(最坏的情况是每天)旧的 blob 会被
angular - 仅列出 Azure Blob 存储中 100 个 Blob 中的 10 个 Blob
在 Azure Blobstorage 中，我有 100 个 Blob，但我只想列出前 10 个 Blob。我该怎么做？我写的{maxResults:1}没有任何效果，它仍然列出了我所有的 Blob
azure - 使用 Azure SDK v1.8 创建的 Blob 是页 Blob 还是 block Blob？
我们当前的代码使用 Azure SDK 1.8，为了生成共享访问签名，它将首先调用 CloudBlobContainer.GetBlobReference()，然后调用 CloudBlob.GetSh
blob - 隐藏 Azure Blob 网址
我有大量文件存储在公共(public) Azure blob 容器中，所有这些文件都通过我的 ASP.NET MVC Web 应用程序中的 HTML 直接引用。例如，blob 存储中一个图像的路径如下
JavaScript Azure Blob 存储移动 Blob
我有一个 NodeJS 后端，它使用 Microsoft 的官方 Blob 存储库 (@azure/storage-blob) 来管理我的 Blob 存储: https://www.npmjs.com

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

azure - 使用 azure 数据工厂将数据从 azure 追加 blob 提取到 kusto 数据库时出错