gpt4 book ai didi

hadoop - pig ,如何在加入和分组后引用字段

转载 作者:可可西里 更新时间:2023-11-01 14:12:50 24 4
gpt4 key购买 nike

我在 Pig 中有这段代码(win、请求和响应只是直接从文件系统加载的表):

win_request = JOIN win BY bid_id, request BY bid_id;
win_request_response = JOIN win_request BY win.bid_id, response BY bid_id;

win_group = GROUP win_request_response BY (win.campaign_id);

win_count = FOREACH win_group GENERATE group, SUM(win.bid_price);

基本上我想在加入和分组后对 bid_price 求和,但出现错误:

Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

我的猜测是我没有正确引用 win.bid_price

最佳答案

在执行多个连接时,我建议为您的字段使用唯一标识符(例如 bid_id)。或者,您也可以使用 disambiguation operator '::',但这可能会变得很脏。

wins = LOAD '/user/hadoop/rtb/wins' USING PigStorage(',') AS (f1_w:int, f2_w:int,  f3_w:chararray);
reqs = LOAD '/user/hadoop/rtb/reqs' USING PigStorage(',') AS (f1_r:int, f2_r:int, f3_r:chararray);
resps = LOAD '/user/hadoop/rtb/resps' USING PigStorage(',') AS (f1_rp:int, f2_rp:int, f3_rp:chararray);

wins_reqs = JOIN wins BY f1_w, reqs BY f1_r;
wins_reqs_reps = JOIN wins_reqs BY f1_r, resps BY f1_rp;

win_group = GROUP wins_reqs_reps BY (f3_w);

win_sum = FOREACH win_group GENERATE group, SUM(wins_reqs_reps.f2_w);

关于hadoop - pig ,如何在加入和分组后引用字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13145797/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com