gpt4 book ai didi

python - 在 Hadoop 上使用 mrjob 启 Action 业时出错

转载 作者:可可西里 更新时间:2023-11-01 14:45:44 25 4
gpt4 key购买 nike

我是 hadoop 和 mrjob 的新手,这本书对我的学习帮助很大。我试图在 hadoop 上运行 mrSVM.py,因为它在本地运行良好。

但我运行了以下命令:python mrSVM.py -r hadoop kickStart.txt并给出以下错误:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/mrSVM.manvendra.20140818.075925.908574
writing wrapper script to /tmp/mrSVM.manvendra.20140818.075925.908574/setup-wrapper.sh
Using Hadoop version 2.5.0
Copying local files into hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/
HADOOP: session.id is deprecated. Instead, use dfs.metrics.session-id
HADOOP: Initializing JVM Metrics with processName=JobTracker, sessionId=
HADOOP: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
HADOOP: Cleaning up the staging area file:/tmp/hadoop-manvendra/mapred/staging/manvendra1365509453/.staging/job_local1365509453_0001
HADOOP: Error launching job , bad input path : File does not exist: /tmp/hadoop-manvendra/mapred/staging/manvendra1365509453/.staging/job_local1365509453_0001/archives/mrjob.tar.gz#mrjob.tar.gz
HADOOP: Streaming Command Failed!
Job failed with return code 512: ['/home/manvendra/hadoop-2.5.0/bin/hadoop', 'jar', '/home/manvendra/hadoop-2.5.0/share/hadoop/tools/lib/hadoop-streaming-2.5.0.jar', '-files', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/mrSVM.py#mrSVM.py', '-archives', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/kickStart.txt', '-output', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/step-output/1', '-mapper', 'sh -e setup-wrapper.sh python mrSVM.py --step-num=0 --mapper', '-reducer', 'sh -e setup-wrapper.sh python mrSVM.py --step-num=0 --reducer']
Scanning logs for probable cause of failure
Traceback (most recent call last):
File "mrSVM.py", line 81, in <module>
MRsvm.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/job.py", line 462, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/job.py", line 480, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/launch.py", line 147, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/launch.py", line 210, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/runner.py", line 464, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/hadoop.py", line 239, in _run
self._run_job_in_hadoop()
File "/usr/local/lib/python2.7/dist-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/hadoop.py", line 369, in _run_job_in_hadoop
raise CalledProcessError(returncode, step_args)
subprocess.CalledProcessError: Command '['/home/manvendra/hadoop-2.5.0/bin/hadoop', 'jar', '/home/manvendra/hadoop-2.5.0/share/hadoop/tools/lib/hadoop-streaming-2.5.0.jar', '-files', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/setup-wrapper.sh#setup-wrapper.sh,hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/mrSVM.py#mrSVM.py', '-archives', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/mrjob.tar.gz#mrjob.tar.gz', '-input', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/files/kickStart.txt', '-output', 'hdfs:///user/manvendra/tmp/mrjob/mrSVM.manvendra.20140818.075925.908574/step-output/1', '-mapper', 'sh -e setup-wrapper.sh python mrSVM.py --step-num=0 --mapper', '-reducer', 'sh -e setup-wrapper.sh python mrSVM.py --step-num=0 --reducer']' returned non-zero exit status 512

请帮我解决这个问题。

最佳答案

这是 Hadoop 2.x 和 mrjob 的一个已知问题。请进行以下更改,格式化您的名称节点,重新启动您的 hadoop 实例 + yarn ,一切都应该工作。

核心站点.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>
</configuration>

然后运行:

hdfs namenode -format
start-dfs.sh
start-yarn.sh

干杯,

图斯詹坦·库本德拉纳坦

关于python - 在 Hadoop 上使用 mrjob 启 Action 业时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25358793/

25 4 0
文章推荐: java - Hadoop Java 字数统计调整不起作用 - 尝试总结所有
文章推荐: javascript - 如何创建动态下拉列表?
文章推荐: c# - 如何使用 HTMLAgilityPack 访问多个
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com