gpt4 book ai didi

hadoop - 无法在 pig 中执行求和运算

转载 作者:行者123 更新时间:2023-12-02 20:46:33 26 4
gpt4 key购买 nike

我正在尝试对Pig中的数据执行求和运算,但是它不接受显式类型转换,而我试图在执行求和时用double替换(int)。


drivers = LOAD '/sachin/drivers.csv' USING PigStorage(',');
time = LOAD '/sachin/timesheet.csv' USING PigStorage(',');
drivdata = FILTER drivers BY $0>1;
timedata = filter time by $0>0;
drivgrp = group timedata by $0;
drivinfo = foreach drivgrp generate group as id , SUM(timedata.$2) as totalhr , SUM(timedata.$3) as totmillogged;
drivfinal = foreach drivdata generate $0 as id , $1 as name;
result = join drivfinal by id , drivinfo by id;
finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, $4 as mileslogged;
summile = foreach finalres generate (int)SUM(mileslogged);
DUMP summile;

错误信息
grunt> exec /home/sachin/sec.pig
2017-12-13 21:57:58,812 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 1 time(s).
2017-12-13 21:57:58,854 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:58,996 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,036 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,080 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,121 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,192 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_INT 2 time(s).
2017-12-13 21:57:59,246 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: <line 10, column 41> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.
Details at logfile: /home/sachin/pig_1513175202309.log
grunt>

我实际上是在尝试对前5个列表中的每个驱动程序执行操作,并查找记录的里程和该驱动程序记录的里程占记录的总里程的百分比,并将结果存储在hdfs中。

数据集链接: https://raw.githubusercontent.com/hortonworks/data-tutorials/master/tutorials/hdp/how-to-process-data-with-apache-pig/assets/driver_data.zip
谁能帮助我解决这个问题或帮助我了解这里出了什么问题?

最佳答案

您必须强制转换里程碑,然后调用SUM函数

finalres = foreach result generate $0 as id, $1 as name, $3 as hrslogged, (int)$4 as mileslogged; 
summile = foreach finalres generate SUM(mileslogged);

我还注意到您没有在load语句中指定数据类型。默认数据类型为bytearray,并且我怀疑如果您在后续步骤中未显式转换字段,则将获得正确的结果。

关于hadoop - 无法在 pig 中执行求和运算,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47798034/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com