gpt4 book ai didi

hadoop - Apache Pig 转换顺序

转载 作者:可可西里 更新时间:2023-11-01 16:28:13 26 4
gpt4 key购买 nike

我正在阅读 Alan Gates 的 Pig Programming。

考虑代码:

ratings = LOAD '/user/maria_dev/ml-100k/u.data' AS 
(userID:int, movieID:int, rating:int, ratingTime:int);

metadata = LOAD '/user/maria_dev/ml-100k/u.item' USING PigStorage ('|') AS
(movieID:int, movieTitle:chararray, releaseDate:chararray, imdbLink: chararray);

nameLookup = FOREACH metadata GENERATE
movieID, movieTitle, ToDate(releaseDate, 'dd-MMM-yyyy') AS releaseYear;

nameLookupYear = FOREACH nameLookup GENERATE
movieID, movieTitle, GetYear(releaseYear) AS finalYear;

filterMovies = FILTER nameLookupYear BY finalYear < 1982;

groupedMovies = GROUP filterMovies BY finalYear;

orderedMovies = FOREACH groupedMovies {
sortOrder = ORDER metadata by finalYear DESC;
GENERATE GROUP, finalYear;
};

DUMP orderedMovies;

它指出

"Sorting by maps, tuples or bags produces error".

我想知道如何对分组结果进行排序。

转换是否需要遵循特定的顺序才能起作用?

最佳答案

由于您正在尝试对分组结果进行排序,因此不需要嵌套的 foreach。例如,如果您尝试按标题或发行日期对一年内的每部电影进行排序,则可以使用嵌套的 foreach。尝试照常排序(将 finalYear 称为 group,因为您在上一行中按 finalYear 分组):

orderedMovies = ORDER groupedMovies BY group ASC;

DUMP orderedMovies;

关于hadoop - Apache Pig 转换顺序,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51060117/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com