java - 在数据集上运行 "aggregate"函数时出现代数错误-6ren

java - 在数据集上运行 "aggregate"函数时出现代数错误

转载作者：可可西里更新时间：2023-11-01 16:22:40

我正在通过 hortonworks.com 上的教程学习 hadoop/pig/hive

我确实试图找到该教程的链接，但不幸的是，它只附带了他们提供给您的 ISA 镜像。它实际上并未托管在他们的网站上。

batting = load 'Batting.csv' using PigStorage(',');
runs = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
grp_data = GROUP runs by (year);
max_runs = FOREACH grp_data GENERATE group as grp,MAX(runs.runs) as max_runs;
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;
dump join_data;

我已经按照教程中的说明完全复制了他们的代码，我得到了这个输出:

2013-06-14 14:34:37,969 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1.1.3.0.0-107 (rexported) compiled May 20 2013, 03:04:35
2013-06-14 14:34:37,977 [main] INFO  org.apache.pig.Main - Logging error messages to: /hadoop/mapred/taskTracker/hue/jobcache/job_201306140401_0020/attempt_201306140401_0020_m_000000_0/work/pig_1371245677965.log
2013-06-14 14:34:38,412 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /usr/lib/hadoop/.pigbootup not found
2013-06-14 14:34:38,598 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox:8020
2013-06-14 14:34:38,998 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: sandbox:50300
2013-06-14 14:34:40,819 [main] WARN  org.apache.pig.PigServer - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).
2013-06-14 14:34:40,827 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: HASH_JOIN,GROUP_BY
2013-06-14 14:34:41,115 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2013-06-14 14:34:41,160 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
2013-06-14 14:34:41,201 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer - Rewrite: POPackage->POForEach to POJoinPackage
2013-06-14 14:34:41,213 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 3
2013-06-14 14:34:41,213 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees.
2013-06-14 14:34:41,214 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 out of total 3 MR operators.
2013-06-14 14:34:41,214 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
2013-06-14 14:34:41,488 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2013-06-14 14:34:41,551 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2013-06-14 14:34:41,555 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
2013-06-14 14:34:41,559 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=6398990
2013-06-14 14:34:41,559 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting Parallelism to 1
2013-06-14 14:34:44,244 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job5371236206169131677.jar
2013-06-14 14:34:49,495 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job5371236206169131677.jar created
2013-06-14 14:34:49,517 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up multi store job
2013-06-14 14:34:49,529 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Key [pig.schematuple] is false, will not generate code.
2013-06-14 14:34:49,530 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Starting process to move generated code to distributed cacche
2013-06-14 14:34:49,530 [main] INFO  org.apache.pig.data.SchemaTupleFrontend - Setting key [pig.schematuple.classes] with classes to deserialize []
2013-06-14 14:34:49,755 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2013-06-14 14:34:50,144 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-06-14 14:34:50,145 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2013-06-14 14:34:50,256 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2013-06-14 14:34:50,316 [JobControl] INFO  com.hadoop.compression.lzo.GPLNativeCodeLoader - Loaded native gpl library
2013-06-14 14:34:50,444 [JobControl] INFO  com.hadoop.compression.lzo.LzoCodec - Successfully loaded & initialized native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
2013-06-14 14:34:50,665 [JobControl] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is available
2013-06-14 14:34:50,666 [JobControl] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2013-06-14 14:34:50,666 [JobControl] INFO  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library loaded
2013-06-14 14:34:50,680 [JobControl] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2013-06-14 14:34:52,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201306140401_0021
2013-06-14 14:34:52,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases batting,grp_data,max_runs,runs
2013-06-14 14:34:52,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: batting[1,10],runs[2,7],max_runs[4,11],grp_data[3,11] C: max_runs[4,11],grp_data[3,11] R: max_runs[4,11]
2013-06-14 14:34:52,796 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://sandbox:50030/jobdetails.jsp?jobid=job_201306140401_0021
2013-06-14 14:36:01,993 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
2013-06-14 14:36:04,767 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-06-14 14:36:04,768 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201306140401_0021 has failed! Stop running all dependent jobs
2013-06-14 14:36:04,768 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-06-14 14:36:05,029 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2106: Error executing an algebraic function
2013-06-14 14:36:05,030 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-06-14 14:36:05,042 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 

HadoopVersion   PigVersion  UserId  StartedAt   FinishedAt  Features
1.2.0.1.3.0.0-107   0.11.1.1.3.0.0-107  mapred  2013-06-14 14:34:41 2013-06-14 14:36:05 HASH_JOIN,GROUP_BY

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_201306140401_0021   batting,grp_data,max_runs,runs  MULTI_QUERY,COMBINER    Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201306140401_0021_m_000000  

Input(s):
Failed to read data from "hdfs://sandbox:8020/user/hue/batting.csv"

Output(s):

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201306140401_0021   ->  null,
null


2013-06-14 14:36:05,042 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2013-06-14 14:36:05,043 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias join_data
Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201306140401_0020/attempt_201306140401_0020_m_000000_0/work/pig_1371245677965.log

当切换这部分时:MAX(runs.runs) 到 avg(runs.runs) 然后我遇到一个完全不同的问题:

2013-06-14 14:38:25,694 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.1.1.3.0.0-107 (rexported) compiled May 20 2013, 03:04:35
2013-06-14 14:38:25,695 [main] INFO  org.apache.pig.Main - Logging error messages to: /hadoop/mapred/taskTracker/hue/jobcache/job_201306140401_0022/attempt_201306140401_0022_m_000000_0/work/pig_1371245905690.log
2013-06-14 14:38:26,198 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /usr/lib/hadoop/.pigbootup not found
2013-06-14 14:38:26,438 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://sandbox:8020
2013-06-14 14:38:26,824 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: sandbox:50300
2013-06-14 14:38:28,238 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve avg using imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Details at logfile: /hadoop/mapred/taskTracker/hue/jobcache/job_201306140401_0022/attempt_201306140401_0022_m_000000_0/work/pig_1371245905690.log

有人知道问题出在哪里吗？

最佳答案

我相信很多人都会想到这一点。我将 Eugene 的解决方案与 Hortonworks 的原始代码相结合，以便我们获得教程中特定的准确输出。以下代码有效并产生教程中指定的精确输出:

batting = LOAD 'Batting.csv' using PigStorage(',');
runs_raw = FOREACH batting GENERATE $0 as playerID, $1 as year, $8 as runs;
runs = FILTER runs_raw BY runs > 0;
grp_data = group runs by (year);
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) as max_runs;
join_max_run = JOIN max_runs by ($0, max_runs), runs by (year,runs);
join_data = FOREACH join_max_run GENERATE $0 as year, $2 as playerID, $1 as runs;

转储连接数据；

注意:行“runs = FILTER runs_raw BY runs > 0;”比 Hortonworks 提供的内容还要多，感谢 Eugene 分享了我用来修改原始 Hortonworks 代码以使其工作的工作代码。

关于java - 在数据集上运行 "aggregate"函数时出现代数错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17117694/

文章推荐： http - X-Cloud-Trace-Context 的目的是什么？

文章推荐： hadoop - 有没有办法在 pig 中添加日期时间？

npm 安装不起作用 | npm 错误!路径 | npm 错误!代码 | npm 错误!错误 | npm 错误!系统调用 | npm 错误!恩恩特
我已经使用 vue-cli 两个星期了，直到今天一切正常。我在本地建立这个项目。 https://drive.google.com/open?id=0BwGw1zyyKjW7S3RYWXRaX24tQ
python - pytesseract 错误 Windows 错误 [错误 2]
您好，我正在尝试使用 python 库 pytesseract 从图像中提取文本。请找到代码: from PIL import Image from pytesseract import image_
C 错误 TLS 错误
我的错误 /usr/bin/ld: errno: TLS definition in /lib/libc.so.6 section .tbss mismatches non-TLS reference
r - 错误 `contrasts' 错误
我已经训练了一个模型，我正在尝试使用 predict函数但它返回以下错误。 Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]])
postgresql - PowerBI 直接查询连接到 PostgreSQL 错误。 OLE 或 ODBC 错误 : [Expression. 错误] 我们无法将表达式折叠到数据源
根据Microsoft DataConnectors的信息我想通过 this ODBC driver 创建一个从 PowerBi 到 PostgreSQL 的连接器使用直接查询。我重用了 Micros
java - Android MediaPlayer 错误(在状态 1 中开始调用；错误 (-38, 0)；错误 (-38,0))
我已经为 SoundManagement 创建了一个包，其中有一个扩展 MediaPlayer 的类。我希望全局控制这个变量。这是我的代码: package soundmanagement; impo
heroku - PG::错误:错误:Heroku的内存不足
我在Heroku上部署了一个应用程序。我正在使用免费服务。我经常收到以下错误消息。 PG::Error: ERROR: out of memory 如果刷新浏览器，就可以了。但是随后，它又随机发生
.htaccess - .htaccess 错误，错误 500
我正在运行 LAMP 服务器，这个 .htaccess 给我一个 500 错误。其作用是过滤关键字并重定向到相应的域名。 Options +FollowSymLinks RewriteEngine
robocopy 错误，错误 32 (0x00000020)
我有两个驱动器 A 和 B。使用 python 脚本，我在“A”驱动器中创建一些文件，并运行 powerscript，该脚本以 1 秒的间隔将驱动器 A 中的所有文件复制到驱动器 B。我在 powe
postgresql 错误 - 错误 : input is out of range
下面的函数一直返回这个错误信息。我认为可能是 double_precision 字段类型导致了这种情况，我尝试使用 CAST，但要么不是这样，要么我没有做对...帮助？这是错误: ERROR: i
mysql - 错误 1064 MySQL 错误
这个问题已经有答案了: Syntax error due to using a reserved word as a table or column name in MySQL (1 个回答) 已关闭
mysql - mysql 错误(错误 1136)
我的数据库有这个小问题。我创建了一个表“articoli”，其中包含商品的品牌、型号和价格。每篇文章都由一个 id (ID_ARTICOLO)` 定义，它是一个自动递增字段。好吧，现在当我尝试插
c++ - 错误 C2228、错误 C2275
我是新来的。我目前正在 DeVry 在线学习中级 C++ 编程。我们正在使用 C++ Primer Plus 这本书，到目前为止我一直做得很好。我的老师最近向我们扔了一个曲线球。我目前的任务是这样的:
c++ - 错误 LNK2019 错误 C++
这个问题在这里已经有了答案: What is an undefined reference/unresolved external symbol error and how do I fix it?
html - 奇怪的 IE7 错误/错误
我的网站中有一段代码有问题；此错误仅发生在 Internet Explorer 7 中。我没有在这里发布我所有的 HTML/CSS 标记，而是发布了网站的一个版本 here . 如您所见，我在列中有
node.js - 错误!错误 : EPERM,
如果尝试在 USB 设备上构建 node.js 应用程序时在我的树莓派上使用 npm 时遇到一些问题。 package.json 看起来像这样: { "name" : "node-todo",
python - 无 Python 错误/错误？
在 Python 中，您有 None单例，在某些情况下表现得很奇怪: >>> a = None >>> type(a) >>> isinstance(a,None) Traceback (most
java - Android Studio 错误 - 错误 :java. util.concurrent.ExecutionException : com. android.tools.aapt2.Aapt2Exception:AAPT 错误
这是我的 build.gradle (Module:app) 文件: apply plugin: 'com.android.application' android { compileSdkV
android - 任务 ':app:compileDebugJavaWithJavac' 执行失败。错误 :(2055, 52) 错误 : ';' expected Error:(2055, 59) 错误:<标识符> 预期
我是 android 的新手，我的项目刚才编译和运行正常，但在我尝试实现抽屉导航后，它给了我这个错误 FAILURE: Build failed with an exception. What wen
PHP 7.2.25 错误!= 错误？
谁能解释一下？我想我正在做一些非常愚蠢的事情，并且急切地等待着启蒙。我得到这个输出: phpversion() == 7.2.25-1+0~20191128.32+debian8~1.gbp108

可可西里

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

java - 在数据集上运行 "aggregate"函数时出现代数错误