gpt4 book ai didi

hadoop - 如何删除数据低于 pig 平均水平的元组

转载 作者:可可西里 更新时间:2023-11-01 16:47:28 25 4
gpt4 key购买 nike

我有一个包含 3 个字段(id、名称和 post_num)的包,我想删除其 post_num 小于每个名称的平均 post_num 的元组。例如,我有如下 4 个数据:

1,Dav,5
2,大卫,6
3,大卫,4
4, 艾德, 1

然后第三个数据应该被丢弃,因为 Dav 的平均 post_num 是 5。

我可以不用 UDF 吗?

最佳答案

-- ## Suppose you have
-- 1000,SMITH,123
-- 1001,JOHN,452
-- 1002,TWAIN,125
-- 1003,HARDY,124
-- 1004,CHILD,785
-- 1005,CHILD,639
-- 1006,DAVIS,89
-- 1007,DAVIS,173
-- 1008,MIKE,420
-- 1009,DENNIS,562
-- 1010, CHILD,638

### Then try this on Pig CLI:

data = LOAD '/mnt/e_drive/temp/csdata.csv' USING PigStorage(',') as (id:int, name:chararray, post_num:int);
-- data: {id: int,name: chararray,post_num: int}

grpData= GROUP data BY name;
-- grpData: {group: chararray,data: {(id: int,name: chararray,post_num: int)}}

avgData = foreach grpData generate FLATTEN(data), AVG(data.post_num) as avg_post_num;
--avgData: {data::id: int,data::name: chararray,data::post_num: int,avg_post_num: double}

filterData = filter avgData by (double) data::post_num >= avg_post_num;
--filterData: {data::id: int,data::name: chararray,data::post_num: int,avg_post_num: double}

requiredData= foreach filterData generate data::id as id, data::name as name, data::post_num as post_num;
--requiredData: {id: int,name: chararray,post_num: int}



-- TO Debug ---------------------
dump avgData;

-- (1001,JOHN,452,452.0)
-- (1008,MIKE,420,420.0)
-- (1010,CHILD,638,687.3333333333334)
-- (1005,CHILD,639,687.3333333333334)
-- (1004,CHILD,785,687.3333333333334)
-- (1007,DAVIS,173,131.0)
-- (1006,DAVIS,89,131.0)
-- (1003,HARDY,124,124.0)
-- (1000,SMITH,123,123.0)
-- (1002,TWAIN,125,125.0)
-- (1009,DENNIS,562,562.0)


dump requiredData;
--(1001,JOHN,452)
--(1008,MIKE,420)
--(1004,CHILD,785)
--(1007,DAVIS,173)
--(1003,HARDY,124)
--(1000,SMITH,123)
--(1002,TWAIN,125)
--(1009,DENNIS,562)

关于hadoop - 如何删除数据低于 pig 平均水平的元组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35931371/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com