gpt4 book ai didi

hadoop - 使用Pig将最大值填充到同一键的相邻记录

转载 作者:行者123 更新时间:2023-12-02 20:57:23 25 4
gpt4 key购买 nike

我在下面设置了数据

key,value
---------
key1|10
key1|20
key1|30
key2|50
key2|70

我需要使用最大“值”列填充同一键的新列。

输出必须是
key1|10|30
key1|20|30
key1|30|30
key2|50|70
key2|70|70

Below is the Pig script, but facing issues.
A = LOAD 'input.txt' using PigStorage('|');
B = foreach A generate $0,$1,min($1);


grunt> A = LOAD 'input.txt' using PigStorage('|');
grunt> B = foreach A generate $0,$1,max($1);

2017-05-26 06:48:02,347 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve max using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

最佳答案

下面的代码应该做。请记住,在使用groupMAXMIN之类的功能之前,需要首先对关系进行AVG

A = load 'file' using PigStorage(',') as (id: chararray, val: int);
B = GROUP A by id;
C = FOREACH B GENERATE FLATTEN(group), MAX(A.val) as (maxval: int);
D = JOIN A by id, C BY group;
E = FOREACH D generate A::id, A::val, C::maxval;
DUMP E;

运行此命令,您应该获得:
(key1,30,30)
(key1,20,30)
(key1,10,30)
(key2,70,70)
(key2,50,70)

关于hadoop - 使用Pig将最大值填充到同一键的相邻记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44195226/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com