gpt4 book ai didi

json - 带有嵌套Json的Hadoop PIG

转载 作者:行者123 更新时间:2023-12-02 20:46:21 24 4
gpt4 key购买 nike

我有按用户评分的电影列表。

    {"_id":59607,"title":"King Corn (2007)",
"genres":["Documentary"],
"ratings":[ {"userId":1860,"rating":3},
{"userId":9970,"rating":3.5},
{"userId":16929,"rating":1.5},
{"userId":23473,"rating":4},
{"userId":23733,"rating":4},
{"userId":27584,"rating":3},
{"userId":28232,"rating":4},
{"userId":29482,"rating":3},
{"userId":40976,"rating":5},
{"userId":44631,"rating":4},
{"userId":47613,"rating":3},
{"userId":49763,"rating":3},
{"userId":58160,"rating":4.5},
{"userId":62249,"rating":3},
{"userId":65923,"rating":4},
{"userId":67507,"rating":4},
{"userId":68259,"rating":3.5},
{"userId":70331,"rating":5},
{"userId":71420,"rating":3.5}
]
}

我需要计算每个用户完成了多少次评分。这是我尝试获得评分。
a = load '/movies_1m.json' using JsonLoader('id:int, title : chararray, genres : { ( genre : chararray ) }, ratings: { ( userId : int, rating: float) } ');

然后
b = FOREACH a GENERATE FLATTEN(ratings);

描述给我以下内容:
b: {ratings::userId: int,ratings::rating: float}

只是为了计算用户数,我需要访问评分内部。但这是不成功的地方。我尝试了这个:
c = FOREACH b GENERATE COUNT(ratings);

它给我一个错误。

我需要得到这样的东西:
 {userId: int, rating: float}

最佳答案

您需要GROUP才能进行COUNT,因为这是一个聚合操作。

b = FOREACH a GENERATE FLATTEN(ratings);
gr = GROUP b by ratings::userId;
c = FOREACH gr GENERATE group,COUNT($1);
\d c

输出量

请注意,您的示例中没有用户重复,因此都是这些。
(1860,1)
(9970,1)
(16929,1)
(23473,1)
(23733,1)
(27584,1)
(28232,1)
(29482,1)
(40976,1)
(44631,1)
(47613,1)
(49763,1)
(58160,1)
(62249,1)
(65923,1)
(67507,1)
(68259,1)
(70331,1)
(71420,1)

关于json - 带有嵌套Json的Hadoop PIG,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47890335/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com