gpt4 book ai didi

hadoop - hadoop流错误,使用python mapreduce

转载 作者:行者123 更新时间:2023-12-02 21:53:54 25 4
gpt4 key购买 nike

我是hadoop环境的新手,您对如何解决此错误有任何想法,或者该错误背后的原因是什么?

hduser@intel-HP-Pavilion-g6-Notebook-PC:~/hduser/hadoop$ sudo ./bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar  -file /home/hduser/map.py  -mapper /home/hduser/map.py -file /home/hduser/red.py -reducer /home/hduser/red.py  -input /home/hduser/tmp/cddb.txt  -output /home/hduser/op1
packageJobJar: [/home/hduser/map.py, /home/hduser/red.py] [] /tmp/streamjob7455767556382290755.jar tmpDir=null
13/06/20 12:43:55 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/06/20 12:43:55 WARN snappy.LoadSnappy: Snappy native library not loaded
13/06/20 12:43:55 INFO mapred.FileInputFormat: Total input paths to process : 1
13/06/20 12:43:55 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
13/06/20 12:43:56 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
13/06/20 12:43:56 INFO streaming.StreamJob: Running job: job_local_0001
13/06/20 12:43:56 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:43:56 INFO util.ProcessTree: setsid exited with exit code 0
13/06/20 12:43:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e2081
13/06/20 12:43:56 INFO mapred.MapTask: numReduceTasks: 1
13/06/20 12:43:56 INFO mapred.MapTask: io.sort.mb = 100
13/06/20 12:43:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/06/20 12:43:56 INFO mapred.MapTask: record buffer = 262144/327680
13/06/20 12:43:56 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./map.py]
13/06/20 12:43:56 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:43:57 INFO streaming.StreamJob: map 0% reduce 0%
13/06/20 12:44:02 INFO mapred.LocalJobRunner: file:/home/hduser/tmp/cddb.txt:0+1205
13/06/20 12:44:03 INFO streaming.StreamJob: map 100% reduce 0%
13/06/20 12:48:11 INFO streaming.PipeMapRed: Records R/W=9/1
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: mapRedFinished
13/06/20 12:48:11 INFO mapred.MapTask: Starting flush of map output
13/06/20 12:48:11 INFO mapred.MapTask: Finished spill 0
13/06/20 12:48:11 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
13/06/20 12:48:11 INFO mapred.LocalJobRunner: Records R/W=9/1
13/06/20 12:48:11 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
13/06/20 12:48:11 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1c84be9
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO mapred.Merger: Merging 1 sorted segments
13/06/20 12:48:11 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1356 bytes
13/06/20 12:48:11 INFO mapred.LocalJobRunner:
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed exec [/home/hduser/hduser/hadoop/./red.py]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
13/06/20 12:48:11 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
Traceback (most recent call last):
File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
main()
File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)
13/06/20 12:48:11 INFO streaming.PipeMapRed: MRErrorThread done
13/06/20 12:48:11 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:11 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
13/06/20 12:48:12 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/06/20 12:48:12 ERROR streaming.StreamJob: Job not successful. Error: NA
13/06/20 12:48:12 INFO streaming.StreamJob: killJob...
Streaming Command Failed!

我正在使用hadoop 1.0.4,并在python中写了map reduce(使用hadoop流)

最佳答案

错误很明显:

Traceback (most recent call last):
File "/home/hduser/hduser/hadoop/./red.py", line 30, in <module>
main()
File "/home/hduser/hduser/hadoop/./red.py", line 19, in main
for similarity, group in groupby(data, itemgetter(0), reverse=True):
TypeError: groupby() takes at most 2 arguments (3 given)

groupby 仅接受2个参数。这是 groupby的文档。

关于hadoop - hadoop流错误,使用python mapreduce,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17206637/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com