gpt4 book ai didi

通过 Azure 数据工厂 V2 将 XML 数据传输到 Azure SQL Server

转载 作者:行者123 更新时间:2023-12-03 03:22:43 24 4
gpt4 key购买 nike

在数据基础设施和管道方面,我是一位经验丰富的 Azure 用户。然而,我是处理 XML 数据的新手,而且我的具体情况似乎有些复杂。在耗尽我的资源后,我向社区寻求帮助。

请注意,显示的所有数据均为演示数据。没有 PII 被泄露

XML 数据存在于我的 blob (ADLS2) 中。我正在使用数据工厂 V2 复制事件将数据复制到我的 Azure SQL 数据库中。所有权限均已适当配置。

请注意,在“申请人”部分内的 XML 中,您可以看到始终存在的“主要”的“类型”。如果有共同借款人,那么将会有第二个“申请人”部分,类型为“共同借款人”。

有问题的 XML 部分:

<LoanExport>
<Application>
<GeneratedDate>06/15/2023</GeneratedDate>
<BusinessApplication/>
<ApplicationNumber>132</ApplicationNumber>
<ApplicationType>Joint</ApplicationType>
<Business>false</Business>
<Applicants>
<Applicant>
<Type>primary</Type>
<FirstName>Robert</FirstName>
<LastName>Ice</LastName>
<MiddleName/>
<NameSuffix/>
<DateOfBirth>02/22/1992</DateOfBirth>
<CreditScore/>
<SocialSecurityNumber>666423221</SocialSecurityNumber>
<IsUSCitizen>true</IsUSCitizen>
<PhoneNumber>5555555555</PhoneNumber>
<WorkPhoneNumber/>
<CellularPhoneNumber>3334445555</CellularPhoneNumber>
<EmailAddress><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b0d4d5dddfdddd8981cad6c4f0c4d3d9d3c2d5d4d9c49ed3dfdd" rel="noreferrer noopener nofollow">[email protected]</a></EmailAddress>
<DriversLicense/>
<DriversLicenseState/>
<DriversLicenseExpirationDate/>
<DriversLicenseIssueDate/>
<MaritalStatus>false</MaritalStatus>
<Addresses>
<Address>
<Type>Primary</Type>
<Address>
<Number>1</Number>
<StreetPrefix/>
<StreetName>Hawk Drive</StreetName>
<StreetType/>
<StreetSuffix/>
<ApartmentOrSuiteNumber/>
<PostalCode>60750</PostalCode>
<City>FANTASY ISLAND</City>
<State>MD</State>
</Address>
<TimeAtResidenceYears>5</TimeAtResidenceYears>
<TimeAtResidenceMonths>7</TimeAtResidenceMonths>
<Ownership>Own</Ownership>
<MonthlyPayment currency="USD">550.00</MonthlyPayment>
</Address>
</Addresses>
<Employments>
<Employment>
<isCurrentEmployment>true</isCurrentEmployment>
<Status>Employed</Status>
<EmployerName>TCI</EmployerName>
<ContactName/>
<ContactPhone>1111111111</ContactPhone>
<JobPosition>Business Analyst</JobPosition>
<Address>
<StreetName/>
<PostalCode/>
<City/>
<State/>
</Address>
<TimeAtEmployerYears>10</TimeAtEmployerYears>
<TimeAtEmployerMonths>6</TimeAtEmployerMonths>
<GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
</Employment>
</Employments>
<DenialReasons>
<DenialReason>
<Reason>Applicant has no SS number</Reason>
</DenialReason>
</DenialReasons>
</Applicant>
<Applicant>
<Type>coBorrower</Type>
<FirstName>asd</FirstName>
<LastName>asd</LastName>
<MiddleName>asd</MiddleName>
<NameSuffix>IV</NameSuffix>
<DateOfBirth>07/17/1973</DateOfBirth>
<CreditScore/>
<SocialSecurityNumber>333333333</SocialSecurityNumber>
<IsUSCitizen>true</IsUSCitizen>
<PhoneNumber>1231231231</PhoneNumber>
<WorkPhoneNumber/>
<CellularPhoneNumber>1231231231</CellularPhoneNumber>
<EmailAddress><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="187c7c7c7c586179707777367b7775" rel="noreferrer noopener nofollow">[email protected]</a></EmailAddress>
<DriversLicense/>
<DriversLicenseState/>
<DriversLicenseExpirationDate/>
<DriversLicenseIssueDate/>
<MaritalStatus>false</MaritalStatus>
<Addresses>
<Address>
<Type>Primary</Type>
<Address>
<Number/>
<StreetPrefix/>
<StreetName>asdasd</StreetName>
<StreetType/>
<StreetSuffix/>
<ApartmentOrSuiteNumber/>
<PostalCode>12345</PostalCode>
<City>Schenectady</City>
<State>NY</State>
</Address>
<TimeAtResidenceYears>17</TimeAtResidenceYears>
<TimeAtResidenceMonths>11</TimeAtResidenceMonths>
<Ownership>Living with relatives</Ownership>
<MonthlyPayment currency="USD">2222.00</MonthlyPayment>
</Address>
</Addresses>
<Employments>
<Employment>
<isCurrentEmployment>true</isCurrentEmployment>
<Status>Retired</Status>
<EmployerName/>
<ContactName/>
<ContactPhone/>
<JobPosition/>
<Address>
<StreetName/>
<PostalCode/>
<City/>
<State/>
</Address>
<TimeAtEmployerYears>0</TimeAtEmployerYears>
<TimeAtEmployerMonths>0</TimeAtEmployerMonths>
<GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
</Employment>
</Employments>
<DenialReasons>
<DenialReason>
<Reason>Applicant has no SS number</Reason>
</DenialReason>
</DenialReasons>
</Applicant>
</Applicants>

我正在使用一个小型 SQL 表来测试是否能够正确提取主要数据和共同借款人数据。

CREATE TABLE [bronze].[loan_origination_application](
[application_number] [nvarchar](max) NULL,
[generated_date] [nvarchar](max) NULL,
[applicant_primary_first_name] [nvarchar](max) NULL,
[applicant_coborrower_first_name] [nvarchar](max) NULL
)

这是我的 Azure 数据工厂管道 JSON 的部分:

                    },
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false,
"disableMetricsCollection": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"path": "['Application']['GeneratedDate']"
},
"sink": {
"name": "generated_date",
"type": "String"
}
},
{
"source": {
"path": "['Application']['ApplicationNumber']"
},
"sink": {
"name": "application_number",
"type": "String"
}
},
{
"source": {
"path": "['Application']['Applicants']['Applicant']['FirstName']"
},
"sink": {
"name": "applicant_primary_first_name",
"type": "String"
}
},
{
"source": {
"path": "['Application']['Applicants']['Applicant'][1]['FirstName']"
},
"sink": {
"name": "applicant_coborrower_first_name",
"type": "String"
}
}
],
"collectionReference": "$['applications']['LoanExport']",
"mapComplexValuesToString": true
}

请注意,在我的所有数据库输出中,4 列分别是:'application_number'、'generate_date'、'applicant_primary_first_name'、'applicant_coborrower_first_name'
目前,这会导致数据库中出现以下写入:

132 06/15/2023  NULL    asd
134 06/15/2023 Robert NULL
135 06/15/2023 Robert NULL
136 06/15/2023 Robert NULL

请注意,在这种情况下,如果有共同借款申请人(申请 132),它将提取正确的姓名,但会忽略主申请人。当没有共同借款申请人时,将准确拉入主申请人。

我尝试了以下以及其他类似的方法:

数据工厂 JSON:

                            {
"source": {
"path": "['Application']['Applicants']['Applicant'][0]['FirstName']"
},
"sink": {
"name": "applicant_primary_first_name",
"type": "String"
}
},
{
"source": {
"path": "['Application']['Applicants']['Applicant'][1]['FirstName']"
},
"sink": {
"name": "applicant_coborrower_first_name",
"type": "String"
}
}

数据库输出:

132 06/15/2023  Robert  asd
134 06/15/2023 NULL NULL
135 06/15/2023 NULL NULL
136 06/15/2023 NULL NULL

在这种情况下,您可以看到主要申请人和共同借款人申请人类型均已正确提取,但仅当有共同借款人时而不是只有主要申请人时。

我还尝试了一些涉及过滤的 Chat GPT 建议,尽管我觉得它走在正确的轨道上,但没有帮助。

                            {
"source": {
"path": "['Application']['Applicants']['Applicant'][?(@['Type']=='primary')]['FirstName']"
},
"sink": {
"name": "applicant_primary_first_name",
"type": "String"
}
},
{
"source": {
"path": "['Application']['Applicants']['Applicant'][?(@['Type']=='coBorrower')]['FirstName']"
},
"sink": {
"name": "applicant_coborrower_first_name",
"type": "String"
}
}

数据库输出与上面效果相同。仅当共同借款人类型的申请人也在场时,才会拉动主要类型的申请人。

132 06/15/2023  Robert  asd
134 06/15/2023 NULL NULL
135 06/15/2023 NULL NULL
136 06/15/2023 NULL NULL

为了清楚起见,我将提供仅包含主要类型申请人的申请的 XML。

<LoanExport>
<Application>
<GeneratedDate>06/15/2023</GeneratedDate>
<BusinessApplication/>
<ApplicationNumber>134</ApplicationNumber>
<ApplicationType>Individual</ApplicationType>
<Business>false</Business>
<Applicants>
<Applicant>
<Type>primary</Type>
<FirstName>Robert</FirstName>
<LastName>Ice</LastName>
<MiddleName/>
<NameSuffix/>
<DateOfBirth>02/22/1992</DateOfBirth>
<CreditScore>0</CreditScore>
<SocialSecurityNumber>666423221</SocialSecurityNumber>
<IsUSCitizen>true</IsUSCitizen>
<PhoneNumber>5555555555</PhoneNumber>
<WorkPhoneNumber/>
<CellularPhoneNumber>3334445555</CellularPhoneNumber>
<EmailAddress><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a0c4c5cdcf94d1d5d192d6d3e0d4c3c9c3d2c5c4c9d48ec3cfcd" rel="noreferrer noopener nofollow">[email protected]</a></EmailAddress>
<DriversLicense/>
<DriversLicenseState/>
<DriversLicenseExpirationDate/>
<DriversLicenseIssueDate/>
<MaritalStatus>false</MaritalStatus>
<Addresses>
<Address>
<Type>Primary</Type>
<Address>
<Number>1</Number>
<StreetPrefix/>
<StreetName>Hawk Drive</StreetName>
<StreetType/>
<StreetSuffix/>
<ApartmentOrSuiteNumber/>
<PostalCode>60750</PostalCode>
<City>FANTASY ISLAND</City>
<State>MD</State>
</Address>
<TimeAtResidenceYears>5</TimeAtResidenceYears>
<TimeAtResidenceMonths>7</TimeAtResidenceMonths>
<Ownership>Own</Ownership>
<MonthlyPayment currency="USD">550.00</MonthlyPayment>
</Address>
</Addresses>
<Employments>
<Employment>
<isCurrentEmployment>true</isCurrentEmployment>
<Status>Employed</Status>
<EmployerName>TCI</EmployerName>
<ContactName/>
<ContactPhone>1111111111</ContactPhone>
<JobPosition>Business Analyst</JobPosition>
<Address>
<StreetName/>
<PostalCode/>
<City/>
<State/>
</Address>
<TimeAtEmployerYears>10</TimeAtEmployerYears>
<TimeAtEmployerMonths>6</TimeAtEmployerMonths>
<GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
</Employment>
</Employments>
<Stipulations>
<Stipulation>
<Description>VOI Applicant</Description>
<Status>Active</Status>
</Stipulation>
</Stipulations>
</Applicant>
</Applicants>

最佳答案

Azure SQL DB 可以使用 OPENROWSET 直接从 Blob 存储中读取。请参阅该技术的开始 here .

对于您的特定 XML 片段,使用 CROSS APPLY 对其进行操作以更深入地研究它:

DECLARE @xml XML = '<LoanExport>
<Application>
<GeneratedDate>06/15/2023</GeneratedDate>
<BusinessApplication/>
<ApplicationNumber>132</ApplicationNumber>
<ApplicationType>Joint</ApplicationType>
<Business>false</Business>
<Applicants>
<Applicant>
<Type>primary</Type>
<FirstName>Robert</FirstName>
<LastName>Ice</LastName>
<MiddleName/>
<NameSuffix/>
<DateOfBirth>02/22/1992</DateOfBirth>
<CreditScore/>
<SocialSecurityNumber>666423221</SocialSecurityNumber>
<IsUSCitizen>true</IsUSCitizen>
<PhoneNumber>5555555555</PhoneNumber>
<WorkPhoneNumber/>
<CellularPhoneNumber>3334445555</CellularPhoneNumber>
<EmailAddress>dem<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="385755550109425e4c784c5b515b4a5d5c514c165b5755" rel="noreferrer noopener nofollow">[email protected]</a></EmailAddress>
<DriversLicense/>
<DriversLicenseState/>
<DriversLicenseExpirationDate/>
<DriversLicenseIssueDate/>
<MaritalStatus>false</MaritalStatus>
<Addresses>
<Address>
<Type>Primary</Type>
<Address>
<Number>1</Number>
<StreetPrefix/>
<StreetName>Hawk Drive</StreetName>
<StreetType/>
<StreetSuffix/>
<ApartmentOrSuiteNumber/>
<PostalCode>60750</PostalCode>
<City>FANTASY ISLAND</City>
<State>MD</State>
</Address>
<TimeAtResidenceYears>5</TimeAtResidenceYears>
<TimeAtResidenceMonths>7</TimeAtResidenceMonths>
<Ownership>Own</Ownership>
<MonthlyPayment currency="USD">550.00</MonthlyPayment>
</Address>
</Addresses>
<Employments>
<Employment>
<isCurrentEmployment>true</isCurrentEmployment>
<Status>Employed</Status>
<EmployerName>TCI</EmployerName>
<ContactName/>
<ContactPhone>1111111111</ContactPhone>
<JobPosition>Business Analyst</JobPosition>
<Address>
<StreetName/>
<PostalCode/>
<City/>
<State/>
</Address>
<TimeAtEmployerYears>10</TimeAtEmployerYears>
<TimeAtEmployerMonths>6</TimeAtEmployerMonths>
<GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
</Employment>
</Employments>
<DenialReasons>
<DenialReason>
<Reason>Applicant has no SS number</Reason>
</DenialReason>
</DenialReasons>
</Applicant>
<Applicant>
<Type>coBorrower</Type>
<FirstName>asd</FirstName>
<LastName>asd</LastName>
<MiddleName>asd</MiddleName>
<NameSuffix>IV</NameSuffix>
<DateOfBirth>07/17/1973</DateOfBirth>
<CreditScore/>
<SocialSecurityNumber>333333333</SocialSecurityNumber>
<IsUSCitizen>true</IsUSCitizen>
<PhoneNumber>1231231231</PhoneNumber>
<WorkPhoneNumber/>
<CellularPhoneNumber>1231231231</CellularPhoneNumber>
<EmailAddress><a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="197d7d7d7d596078717676377a7674" rel="noreferrer noopener nofollow">[email protected]</a></EmailAddress>
<DriversLicense/>
<DriversLicenseState/>
<DriversLicenseExpirationDate/>
<DriversLicenseIssueDate/>
<MaritalStatus>false</MaritalStatus>
<Addresses>
<Address>
<Type>Primary</Type>
<Address>
<Number/>
<StreetPrefix/>
<StreetName>asdasd</StreetName>
<StreetType/>
<StreetSuffix/>
<ApartmentOrSuiteNumber/>
<PostalCode>12345</PostalCode>
<City>Schenectady</City>
<State>NY</State>
</Address>
<TimeAtResidenceYears>17</TimeAtResidenceYears>
<TimeAtResidenceMonths>11</TimeAtResidenceMonths>
<Ownership>Living with relatives</Ownership>
<MonthlyPayment currency="USD">2222.00</MonthlyPayment>
</Address>
</Addresses>
<Employments>
<Employment>
<isCurrentEmployment>true</isCurrentEmployment>
<Status>Retired</Status>
<EmployerName/>
<ContactName/>
<ContactPhone/>
<JobPosition/>
<Address>
<StreetName/>
<PostalCode/>
<City/>
<State/>
</Address>
<TimeAtEmployerYears>0</TimeAtEmployerYears>
<TimeAtEmployerMonths>0</TimeAtEmployerMonths>
<GrossMonthlyIncome currency="USD">0.00</GrossMonthlyIncome>
</Employment>
</Employments>
<DenialReasons>
<DenialReason>
<Reason>Applicant has no SS number</Reason>
</DenialReason>
</DenialReasons>
</Applicant>
</Applicants>
</Application>
</LoanExport>'

--SELECT @xml;

SELECT
l.c.value('(ApplicationNumber/text())[1]', 'INT') AS ApplicationNumber,
l.c.value('(GeneratedDate/text())[1]', 'DATE') AS GeneratedDate,
a.c.value('(FirstName/text())[1]', 'VARCHAR(10)') AS FirstName,
a.c.value('(LastName/text())[1]', 'VARCHAR(10)') AS LastName,

l.c.query('.') l,
l.c.query('.') a

FROM @xml.nodes('LoanExport/Application') l(c)
CROSS APPLY l.c.nodes('Applicants/Applicant') a(c);

关于通过 Azure 数据工厂 V2 将 XML 数据传输到 Azure SQL Server,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76534372/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com