gpt4 book ai didi

azure - 使用自定义 .NET 事件合并 Azure 数据工厂中的两个 CSV 文件

转载 作者:行者123 更新时间:2023-12-03 00:12:27 26 4
gpt4 key购买 nike

我有两个 CSV 文件,其中包含许多 n 列。我必须将这两个 csv 文件与单个 CSV 文件合并,该文件在两个输入文件中都有一个唯一的列。

我彻底浏览了所有博客和网站。所有这些都将导致使用自定义 .NET Activity。所以我只是浏览 this site

但我仍然无法弄清楚 C# 编码中的哪一部分。任何人都可以分享如何在 Azure 数据工厂中使用自定义 .NET 事件合并这两个 CSV 文件的代码吗?

最佳答案

下面是如何使用 U-SQL 将 Zip_Code 列上的两个制表符分隔文件连接起来的示例。此示例假设这两个文件都保存在 Azure Data Lake Storage (ADLS) 中。该脚本可以轻松地合并到数据工厂管道中:

// Get raw input from file A
@inputA =
EXTRACT
Date_received string,
Product string,
Sub_product string,
Issue string,
Sub_issue string,
Consumer_complaint_narrative string,
Company_public_response string,
Company string,
State string,
ZIP_Code string,
Tags string,
Consumer_consent_provided string,
Submitted_via string,
Date_sent_to_company string,
Company_response_to_consumer string,
Timely_response string,
Consumer_disputed string,
Complaint_ID string

FROM "/input/input48A.txt"
USING Extractors.Tsv();


// Get raw input from file B
@inputB =
EXTRACT Provider_ID string,
Hospital_Name string,
Address string,
City string,
State string,
ZIP_Code string,
County_Name string,
Phone_Number string,
Hospital_Type string,
Hospital_Ownership string,
Emergency_Services string,
Meets_criteria_for_meaningful_use_of_EHRs string,
Hospital_overall_rating string,
Hospital_overall_rating_footnote string,
Mortality_national_comparison string,
Mortality_national_comparison_footnote string,
Safety_of_care_national_comparison string,
Safety_of_care_national_comparison_footnote string,
Readmission_national_comparison string,
Readmission_national_comparison_footnote string,
Patient_experience_national_comparison string,
Patient_experience_national_comparison_footnote string,
Effectiveness_of_care_national_comparison string,
Effectiveness_of_care_national_comparison_footnote string,
Timeliness_of_care_national_comparison string,
Timeliness_of_care_national_comparison_footnote string,
Efficient_use_of_medical_imaging_national_comparison string,
Efficient_use_of_medical_imaging_national_comparison_footnote string,
Location string

FROM "/input/input48B.txt"
USING Extractors.Tsv();


// Join the two files on the Zip_Code column
@output =
SELECT b.Provider_ID,
b.Hospital_Name,
b.Address,
b.City,
b.State,
b.ZIP_Code,
a.Complaint_ID

FROM @inputA AS a
INNER JOIN
@inputB AS b
ON a.ZIP_Code == b.ZIP_Code
WHERE a.ZIP_Code == "36033";


// Output the file
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);

这也可以转换为带有文件名和邮政编码参数的 U-SQL 存储过程。

当然有很多种方法可以实现这一目标,每种方法都有自己的优点和缺点。例如,.net 自定义事件对于具有 .net 背景的人来说可能会感觉更舒服,但您需要一些计算来运行它。对于具有 SQL/数据库背景且订阅中有 Azure SQL DB 的人员来说,将文件导入 Azure SQL 数据库将是一个不错的选择。

关于azure - 使用自定义 .NET 事件合并 Azure 数据工厂中的两个 CSV 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41872394/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com