gpt4 book ai didi

hadoop - 如何在具有多个字段的 pig 中加入两个关系

转载 作者:可可西里 更新时间:2023-11-01 16:38:25 24 4
gpt4 key购买 nike

我有两个 CSV 文件:

1- Fertiltiy.csv :

enter image description here

2- Life Expectency.csv :

enter image description here

我想在pig中加入他们,这样结果会是这样的:

enter image description here

我是 pig 的新手,我无法得到正确答案,但这是我的代码:

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();

lifeExpectency = LOAD 'lifeExpectency' USING org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by country, lifeExpectency by country;

B = JOIN fertility by year, lifeExpectency by year;

C = UNION A,B;

DUMP C;

这是我的代码的结果:

enter image description here

最佳答案

您可以按国家和年份进行连接,并选择最终输出所需的必要列。

fertility = LOAD 'fertility' USING org.apache.hcatalog.pig.HCatLoader();
lifeExpectency = LOAD 'lifeExpectency' USING org.apache.hcatalog.pig.HCatLoader();

A = JOIN fertility by (country,year), lifeExpectency by (country,year);
B = FOREACH A GENERATE fertility::country,fertility::year,fertility::fertility,lifeExpectency::lifeExpectency;
DUMP B;

关于hadoop - 如何在具有多个字段的 pig 中加入两个关系,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46875177/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com