gpt4 book ai didi

hadoop - 在 Pig 中按两列聚合数据分组

转载 作者:可可西里 更新时间:2023-11-01 16:49:24 25 4
gpt4 key购买 nike

我有这些数据,需要按两列分组,然后对其他两个字段求和。假设这四列的名称是:OS,device,view,click。我基本上想知道每个操作系统和设备的计数,它们有多少次查看以及有多少次点击。

(2,3346,1,)
(3,3953,1,1)
(25,4840,1,1)
(2,94840,1,1)
(14,0526,1,1)
(37,4864,1,)
(2,7353,1,)

这是我目前的情况

A is data: OS,device,view,click

B = GROUP A BY (OS,device);

Result = FOREACH B {
GENERATE group AS OS,device, SUM(view) AS visits, SUM(click) AS clicks;};
dump Result;

这个不行,错误信息是:Projected field [OS] does not exist in schema: group:tuple(OS:int,device:long),B:bag{:tuple(OS:int,device :long,view:int,click:int)}.

最佳答案

这是经过测试的代码,您缺少 FLATTEN:

A = LOAD '/user/root/pig_data' using PigStorage(',') AS (OS, device, view, click);
B = GROUP A BY (OS, device);
RESULT = FOREACH B GENERATE FLATTEN(group) AS (OS, device), SUM(A.view) as views, SUM(A.click) as clicks;
dump RESULT;

关于hadoop - 在 Pig 中按两列聚合数据分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34099474/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com