azure-sql-database - 为什么 Azure 数据工厂似乎坚持将 DateTimes 作为字符串插入？-6ren

azure-sql-database - 为什么 Azure 数据工厂似乎坚持将 DateTimes 作为字符串插入？

转载作者：行者123 更新时间：2023-12-01 03:06:23

我正在尝试设置一个 Azure 数据工厂，将我的数据从 AzureSQL 数据库复制和非规范化到另一个 AzureSQL 数据库，以用于数据流的报告/BI 目的，但我遇到了插入日期的问题。

这是我的数据流的定义。

{
    "name": "dataflow1",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "AzureSqlTable1",
                        "type": "DatasetReference"
                    },
                    "name": "source1"
                }
            ],
            "sinks": [
                {
                    "dataset": {
                        "referenceName": "AzureSqlTable2",
                        "type": "DatasetReference"
                    },
                    "name": "sink1"
                }
            ],
            "script": "\n\nsource(output(\n\t\tBucketId as string,\n\t\tStreamId as string,\n\t\tStreamIdOriginal as string,\n\t\tStreamRevision as integer,\n\t\tItems as integer,\n\t\tCommitId as string,\n\t\tCommitSequence as integer,\n\t\tCommitStamp as timestamp,\n\t\tCheckpointNumber as long,\n\t\tDispatched as boolean,\n\t\tHeaders as binary,\n\t\tPayload as binary\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> source1\nsource1 sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tformat: 'table',\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false,\n\tmapColumn(\n\t\tBucketId,\n\t\tCommitStamp\n\t)) ~> sink1"
        }
    }
}

这些是我的来源的定义

{
    "name": "AzureSqlTable1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "Source_Test",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "AzureSqlTable",
        "schema": [
            {
                "name": "BucketId",
                "type": "varchar"
            },
            {
                "name": "StreamId",
                "type": "char"
            },
            {
                "name": "StreamIdOriginal",
                "type": "nvarchar"
            },
            {
                "name": "StreamRevision",
                "type": "int",
                "precision": 10
            },
            {
                "name": "Items",
                "type": "tinyint",
                "precision": 3
            },
            {
                "name": "CommitId",
                "type": "uniqueidentifier"
            },
            {
                "name": "CommitSequence",
                "type": "int",
                "precision": 10
            },
            {
                "name": "CommitStamp",
                "type": "datetime2",
                "scale": 7
            },
            {
                "name": "CheckpointNumber",
                "type": "bigint",
                "precision": 19
            },
            {
                "name": "Dispatched",
                "type": "bit"
            },
            {
                "name": "Headers",
                "type": "varbinary"
            },
            {
                "name": "Payload",
                "type": "varbinary"
            }
        ],
        "typeProperties": {
            "tableName": "[dbo].[Commits]"
        }
    }
}

和汇数据集

{
    "name": "AzureSqlTable2",
    "properties": {
        "linkedServiceName": {
            "referenceName": "Dest_Test",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "AzureSqlTable",
        "schema": [],
        "typeProperties": {
            "tableName": "dbo.Test2"
        }
    }
}

使用数据流运行管道时，出现以下错误:

Activity dataflow1 failed: DF-EXEC-1 Conversion failed when converting date and/or time from character string.
com.microsoft.sqlserver.jdbc.SQLServerException: Conversion failed when converting date and/or time from character string.
    at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:258)
    at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:256)
    at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:108)
    at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:28)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.doInsertBulk(SQLServerBulkCopy.java:1611)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.access$200(SQLServerBulkCopy.java:58)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy$1InsertBulk.doExecute(SQLServerBulkCopy.java:709)
    at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.sendBulkLoadBCP(SQLServerBulkCopy.java:739)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1684)
    at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:669)
    at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions.com$microsoft$azure$sqldb$spark$connect$DataFrameFunctions$$bulkCopy(DataFrameFunctions.scala:127)
    at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
    at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:948)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:948)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2226)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2226)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:124)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:459)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1401)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我的 Azure SQL 审计日志显示以下失败的语句(考虑到它使用 VARCHAR(50) 作为 [CommitStamp] 的类型，这并不奇怪:

INSERT BULK dbo.T_301fcb5e4a4148d4a48f2943011b2f04 (
  [BucketId] NVARCHAR(MAX), 
  [CommitStamp] VARCHAR(50), 
  [StreamId] NVARCHAR(MAX), 
  [StreamIdOriginal] NVARCHAR(MAX),
  [StreamRevision] INT,
  [Items] INT,
  [CommitId] NVARCHAR(MAX),
  [CommitSequence] INT, 
  [CheckpointNumber] BIGINT, 
  [Dispatched] BIT,
  [Headers] VARBINARY(MAX),
  [Payload] VARBINARY(MAX),
  [r8e440f7252bb401b9ead107597de6293] INT) 
with (ROWS_PER_BATCH = 4096, TABLOCK)

我完全不知道为什么会发生这种情况。看起来模式信息是正确的，但不知何故，数据工厂/数据流似乎想要插入 CommitStamp作为字符串类型。

根据要求，数据流/代码/计划 View 的输出:



source(output(
        BucketId as string,
        StreamId as string,
        StreamIdOriginal as string,
        StreamRevision as integer,
        Items as integer,
        CommitId as string,
        CommitSequence as integer,
        CommitStamp as timestamp,
        CheckpointNumber as long,
        Dispatched as boolean,
        Headers as binary,
        Payload as binary
    ),
    allowSchemaDrift: true,
    validateSchema: false,
    isolationLevel: 'READ_UNCOMMITTED',
    format: 'table',
    schemaName: '[dbo]',
    tableName: '[Commits]',
    store: 'sqlserver',
    server: 'sign2025-sqldata.database.windows.net',
    database: 'SignPath.Application',
    user: 'Sign2025Admin',
    password: '**********') ~> source1
source1 sink(allowSchemaDrift: true,
    validateSchema: false,
    format: 'table',
    deletable:false,
    insertable:true,
    updateable:false,
    upsertable:false,
    mapColumn(
        BucketId,
        CommitStamp
    ),
    schemaName: 'dbo',
    tableName: 'Test2',
    store: 'sqlserver',
    server: 'sign2025-sqldata.database.windows.net',
    database: 'SignPath.Reporting',
    user: 'Sign2025Admin',
    password: '**********') ~> sink1

最佳答案

我创建了一个数据流，用于将数据从 Azure SQL 数据库复制到另一个 Azure SQL 数据库。成功隐蔽datatime2至 VARCHAR(50) .

这是我的数据流的定义:

{
    "name": "dataflow1",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "DestinationDataset_sto",
                        "type": "DatasetReference"
                    },
                    "name": "source1"
                }
            ],
            "sinks": [
                {
                    "dataset": {
                        "referenceName": "DestinationDataset_mex",
                        "type": "DatasetReference"
                    },
                    "name": "sink1"
                }
            ],
            "script": "\n\nsource(output(\n\t\tID as integer,\n\t\ttName as string,\n\t\tmyTime as timestamp\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tformat: 'table') ~> source1\nsource1 sink(input(\n\t\tID as integer,\n\t\ttName as string,\n\t\tmyTime as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tformat: 'table',\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false) ~> sink1"
        }
    }
}

我的来源的定义:

{
    "name": "DestinationDataset_sto",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureSqlDatabase1",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "AzureSqlTable",
        "schema": [
            {
                "name": "ID",
                "type": "int",
                "precision": 10
            },
            {
                "name": "tName",
                "type": "varchar"
            },
            {
                "name": "myTime",
                "type": "datetime2",
                "scale": 7
            }
        ],
        "typeProperties": {
            "tableName": "[dbo].[demo]"
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

我的接收器设置:

{
    "name": "DestinationDataset_mex",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureSqlDatabase1",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "AzureSqlTable",
        "schema": [
            {
                "name": "ID",
                "type": "int",
                "precision": 10
            },
            {
                "name": "tName",
                "type": "varchar"
            },
            {
                "name": "myTime",
                "type": "varchar"
            }
        ],
        "typeProperties": {
            "tableName": "[dbo].[demo1]"
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

这是我的数据流步骤。

第 1 步:源设置:

第 2 步:接收器设置:

运行成功:

除了 myTime 表 demo 和 demo1 几乎具有相同的模式.

我的源表及其数据:

我的接收器表和从 demo 复制的数据:

数据流计划:

source(output(
        ID as integer,
        tName as string,
        myTime as timestamp
    ),
    allowSchemaDrift: true,
    validateSchema: true,
    isolationLevel: 'SERIALIZABLE',
    format: 'table',
    schemaName: '[dbo]',
    tableName: '[demo]',
    store: 'sqlserver',
    server: '****.database.windows.net',
    database: '****',
    user: 'ServerAdmin',
    password: '**********') ~> source1
source1 sink(input(
        ID as integer,
        tName as string,
        myTime as string
    ),
    allowSchemaDrift: true,
    validateSchema: false,
    format: 'table',
    deletable:false,
    insertable:true,
    updateable:false,
    upsertable:false,
    schemaName: '[dbo]',
    tableName: '[demo1]',
    store: 'sqlserver',
    server: '****.database.windows.net',
    database: '****',
    user: 'ServerAdmin',
    password: '**********') ~> sink1

更新 1:

我手动创建接收器表，发现:

Data Flow can convert datatime2 to VARCHAR()(maybe NVARCHAR()) , date ,datetimeoffset.

当我尝试日期类型时 time , datetime , datetime2 , smalldatetime , 数据流总是报错:

"message": "DF-EXEC-1 Conversion failed when converting date and/or time from character

2019-7-11 更新:

我向 Azure 支持寻求帮助，他们回答我:这是数据流的错误，目前没有解决方案。

2019-7-12 更新:

我与 Azure 支持进行了测试，他们认为这是一个错误。这是新电子邮件:

他们还告诉我 修复已经完成，它将在下一个部署列车中部署。这可能是下周结束 .

希望这可以帮助。

关于azure-sql-database - 为什么 Azure 数据工厂似乎坚持将 DateTimes 作为字符串插入？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56948054/

文章推荐： jquery - Controller 的 Json() 方法不返回数组

文章推荐： JQuery使用Json调用Controller/Action

文章推荐： javascript - 在node.js中使用for循环？

javascript - 将一个服务(工厂)注入(inject)另一个服务(工厂)，两者都是异步的
我应该执行以下操作: 可能通过服务/工厂，使用 $q(异步)查询 API 以获取大型名称数据集有另一个服务(也是异步的)，它应该只返回上述工厂的元素，如果它们与某个字符串(搜索字段)匹配。目的是缩小
通用基类具体实现的c#工厂
我有一个通用的基类。我有一个实现基类的具体类。我将如何创建工厂类/方法来交付不同类型的具体类？举个例子: public class ReceiverBase where T : IInte
派生类型的 Ninject 工厂
我正在查看以下链接中的 Ninject Factory 扩展: http://www.planetgeek.ch/2011/12/31/ninject-extensions-factory-intro
oop - 工厂、提供商和服务之间的区别？
工厂、提供商和服务这三个术语之间有什么区别？刚刚了解 NHibernate 及其存储库模式(POCO 类等)。最佳答案工厂:通过将一堆位组合在一起或基于某种上下文选择类型来组装类 Provide
Javassist 工厂 - 设置自定义类加载器
使用CGLIB我可以做到 final var enhancer = new Enhancer(); enhancer.setUseCache(false); enhancer.setSuperclas
内部嵌套类的 Kotlin 工厂
我试图在 Kotlin 中使用伴随对象工厂方法(相当于 Java 中的静态工厂方法)创建一个嵌套内部类。这是我的代码的简化版本。 class OuterClass { var myData:L
具有泛型类型的 Java 工厂
我正在为我的大学做一个项目，但遇到了问题。基本上，该项目由一个客户端-服务器应用程序组成，我想创建一个用于通信的 Packet 类。数据包由 header 和主体组成。现在问题来了。我可以有一些不同
具有继承性的 C++ 工厂
这个问题在这里已经有了答案: Why doesn't polymorphism work without pointers/references? (6 个答案) What is object sl
C++ 工厂。子构造函数不能从父访问
我正在制作一个套接字工厂。我希望每个外部应用程序都使用 Socket 类的接口(interface)，它是几个类(ServerSocketTCP、ClientSocketTCP、ServerSocke
javascript - AngularJS 工厂
我是 angularjs 的新手，我正在尝试创建一个小型电影数据库。这是我第一次使用工厂，我想确保这是正确的方法，以及如何在另一个功能中使用这个工厂，如下所示？我希望这个工厂只运行一次，这样我就可以
Java - 工厂，实例
这个问题在这里已经有了答案: Java inner class and static nested class (28 个答案) 关闭 5 年前。 public class DataFactory
C++(有点)工厂
我看过很多关于 C++ 工厂的帖子，但到目前为止我还没有看到解决我的问题的解决方案。 (虽然我可能遗漏了一些东西。) 示例控制台应用程序: #include #include #include
模板化单例的 C++ 工厂
这是一个简单的 C++ 项目，有 2 种设计模式:单例和工厂，sigleton 也是一个模板化类，一个接口(interface) (IHash) 和一个类 (Hash1)。一个简单的工厂类 (Hash
泛型类的 Java 工厂
这个问题类似于Factory and generics ，并且可能有相同的答案，但它是不同的。我有一个通用基类，它将由完全独立的 JAR 中的类进行扩展。所述 JAR 应该能够在不更改任何其他代码的情
带有可选参数的 JavaScript 工厂
问题是我需要为传递的类创建一个新实例有没有办法重写这个函数，让它可以接受任意数量的参数？ function createInstance(ofClass, arg1, arg2, arg3, ...
C++ createObject() 工厂
我想用简单的 C++ 语法创建一个简单的工厂方法: void *createObject(const char *str,...) { if(!strcmp("X",str)) retu
php - 工厂/抽象工厂混淆
经过大约 10 个月的程序化 PHP 学习后，我现在正尝试着手研究基本的 OOP 原则和设计模式。这是一个爱好，我没有那么多时间去追求它，所以请原谅这个问题的水平很低。我的网站(目前 100% 程序
database - Laravel 工厂 - 创建或动态化
我有一个简单的问题。我如何编写一个工厂来定义使用 make() 或 create() 的关系，具体取决于原始调用 make() 还是 create()？这是我的用例: 我有一个简单的工厂 /**
Angular:延迟加载模块不调用 InjectionToken 工厂
我正在尝试在延迟加载模块中提供 APP_BASE_HREF 注入(inject) token ，然而，工厂方法根本没有被调用。在这里https://github.com/MaurizioCascia
Typescript AST 工厂 - 如何使用注释？
我有以下 ast: import { factory as f } from 'typescript' const typeDeclaration = f.createTypeAliasDeclara

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

azure-sql-database - 为什么 Azure 数据工厂似乎坚持将 DateTimes 作为字符串插入？