gpt4 book ai didi

hadoop - pig 分组用户,同时维护其他字段

转载 作者:行者123 更新时间:2023-12-02 21:42:28 25 4
gpt4 key购买 nike

我猜这个问题与此类似:
Selecting fields after grouping in Pig
但是这是我对以下组成的示例数据的问题:

user_name, movie_name, company, rating

Jim, Jaws, A, 4

Jim, Baseball, B, 4

Matt, Halo, A, 5

Matt, Baseball, B, 4

Matt, History of Chairs, B, 3.5

Pat, History of Chairs, B, 3

John, History of Chairs, B, 2

Frank, Battle Tanks, A, 3

Frank, History of Chairs, B, 5


如何将用户观看过的所有电影归为一组,而又不会丢失其他信息,例如公司和评分。
我要添加用户对电影公司A和电影公司B的所有评分的加倍。

Jim, Jaws, Baseball, 8

Matt, Halo, Baseball, 9

Frank, Battle Tanks, History of Chairs, 8


将是以下格式的输出:
用户,公司A,公司B,等级
我从负载开始,然后是
r1 = LOAD 'data.csv' USING PigStorage(',') as (user_name:chararray, movie_name:chararray, company_name:chararray, rating:int);
r2 = group r1 by user_name;
r3 = foreach r2 generate group as user_name, flatten(r1);
r4A = filter r3 by company_name == 'A';
r4B = filter r3 by company_name == 'B';
但是我有类似的东西

(Frank,Frank,Battle Tanks,A,3)


然后,我计划将r4A和r4B与额定值的总和相乘。但是我不确定重复的user_name是否会增加效率。
这是正确的方法吗?有什么更好的办法吗?
任何帮助,将不胜感激!

最佳答案

你可以试试这个吗?

输入:

Jim,Jaws,A,4
Jim,Baseball,B,4
Matt,Halo,A,5
Matt,Baseball,B,4
Matt,History of Chairs,B,3.5
Pat,History of Chairs,B,3
John,History of Chairs,B,2
Frank,Battle Tanks,A,3
Frank,History of Chairs,B,5

PigScript:
A = LOAD 'input' USING PigStorage(',') AS (user_name:chararray, movie_name:chararray, company:chararray, rating:float);
B = GROUP A BY user_name;
C = FOREACH B {
filterCompanyA = FILTER A BY company=='A';
sumA = SUM(filterCompanyA.rating);

filterCompanyB = FILTER A BY company=='B';
sumB = SUM(filterCompanyB.rating);

GENERATE group AS user,
FLATTEN(REPLACE(BagToString(filterCompanyA.movie_name),'_',',')) AS companyA,
FLATTEN(REPLACE(BagToString(filterCompanyB.movie_name),'_',',')) AS companyB,
(((sumA is null)?0:sumA)+((sumB is null)?0:sumB)) AS Rating;
}

D = FOREACH C GENERATE user,companyA,companyB,Rating;
DUMP D;

输出:
(Jim,Jaws,Baseball,8.0)
(Pat,,History of Chairs,3.0)
(John,,History of Chairs,2.0)
(Matt,Halo,Baseball,History of Chairs,12.5)
(Frank,Battle Tanks,History of Chairs,8.0)

在上面的输出中 Pat and John在CompanyA中没有看过任何电影,因此输出为null,即为空

关于hadoop - pig 分组用户,同时维护其他字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27697131/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com