gpt4 book ai didi

hadoop - pig : Select records from a relaltion only if it is present in another relation

转载 作者:可可西里 更新时间:2023-11-01 15:15:51 26 4
gpt4 key购买 nike

我有以下电影数据库的数据集:

Ratings: UserID, MovieID, Rating
Movies: MovieID, Genre

我使用以下方法过滤掉类型为“ Action ”或“ war ”的电影:

movie_filter = filter Movies by (genre matches '.*Action.*') OR (genre matches '.*War.*');

现在,我必须计算 war 片或 Action 片的平均收视率。但是评级存在于评级文件中。为此,我使用查询:

movie_groups = GROUP movie_filter BY MovieID;

result = FOREACH movie_groups GENERATE Ratings.MovieID, AVG(Ratings.rating);

然后我将结果存储在一个目录位置。但是当我运行程序时,出现以下错误:

Could not infer the matching function for org.apache.pig.builtin.AVG as multiple or none of them fit. Please use an explicit cast.

谁能告诉我我做错了什么?提前致谢。

最佳答案

您似乎缺少连接语句,该语句会在 MovieID 列上连接您的两个数据集(评级和电影)。我模拟了一些测试数据,并在下面提供了一些示例代码。



movie_avg.pig

ratings = LOAD 'movie_ratings.txt' USING PigStorage(',') AS (user_id:chararray, movie_id:chararray, rating:int);
movies = LOAD 'movie_data.txt' USING PigStorage(',') AS (movie_id:chararray,genre:chararray);

movies_filter = FILTER movies BY (genre MATCHES '.*Action.*' OR genre MATCHES '.*War.*');

movies_join = JOIN movies_filter BY movie_id, ratings BY movie_id;

movies_cleanup = FOREACH movies_join GENERATE movies_filter::movie_id AS movie_id, ratings::rating as rating;

movies_group = GROUP movies_cleanup by movie_id;

data = FOREACH movies_group GENERATE group, AVG(movies_cleanup.rating);

dump data;



movie_avg.pig 的输出

(Jarhead,3.0)
(Platoon,4.333333333333333)
(Die Hard,3.0)
(Apocolypse Now,4.5)
(Last Action Hero,2.0)
(Lethal Weapon, 4.0)



movie_data.txt

Scrooged,Comedy
Apocolypse Now,War
Platoon,War
Guess Whos Coming To Dinner,Drama
Jarhead,War
Last Action Hero,Action
Die Hard,Action
Lethal Weapon,Action
My Fair Lady,Musical
Frozen,Animation



movie_ratings.txt

12345,Scrooged,4
12345,Frozen,4
12345,My Fair Lady,5
12345,Guess Whos Coming To Dinner,5
12345,Platoon,3
12345,Jarhead,2
23456,Platoon,5
23456,Apocolypse Now,4
23456,Die Hard,3
23456,Last Action Hero,2
34567,Lethal Weapon,4
34567,Jarhead,4
34567,Apocolypse Now,5
34567,Platoon,5
34567,Frozen,5

关于hadoop - pig : Select records from a relaltion only if it is present in another relation,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22490397/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com