gpt4 book ai didi

hadoop - Apache PIG - 按 ID 计数(*)分组并创建排名

转载 作者:可可西里 更新时间:2023-11-01 15:27:48 26 4
gpt4 key购买 nike

各位专家好,

我有这个数据集:

Field_A Field_B    DATE
John 1 01-01-2016
John 1 05-01-2016
Cate 1 05-01-2016
Cate 4 01-01-2016
Cate 6 05-01-2016
Perdi 4 01-01-2016

我正在尝试计算每个 Field_A 的计数 (*) 并根据字段 A 和日期创建排名。基本上我想返回这个:

Field_A Count   Rank    Field_B
John 2 1 1
John 2 2 1
Cate 3 3 1
Cate 3 4 4
Cate 3 3 6
Perdi 1 5 4

为此,我正在尝试使用以下代码:

DATA  = load '...'
AS
(Field_A:Int,
FIELD_B:Int,
DATE:CHARARRAY);
A = rank DATA BY Field_A;
B = GROUP A BY $0;
C = foreach B {
CNT = COUNT(A.Field_A);
generate $0, CNT;
}
D = join A by $0, C by $0;
E = rank D BY DATE,Field_A DENSE;
F = foreach E generate $0 AS RANK,Field_A,CNT;
DUMP F;

但我收到以下错误:

<file script.pig, line 35, column 69> Invalid field projection. Projected field [CNT] does not exist in schema;

我该如何解决这个问题?

非常感谢!

最佳答案

C = foreach B {
generate group as Field_A, COUNT(A) as CNT;
}

关于hadoop - Apache PIG - 按 ID 计数(*)分组并创建排名,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41214427/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com