gpt4 book ai didi

python - 在 hadoop 集群上运行时出现 MRJob 错误

转载 作者:可可西里 更新时间:2023-11-01 15:06:36 28 4
gpt4 key购买 nike

我正在尝试使用 hadoop 集群和 MRJob 运行 python 作业,我的包装器脚本如下:

#!/bin/bash

. /etc/profile
module load use.own
module load python/python2.7
module load python/mrjob

python path_to_python-script/mr_word_freq_count.py path_to_input_file/input.txt -r hadoop `> path_to_output_file/output.txt #note the output file already exists before I submit the job`

所以一旦我使用 qsub myscript.sh 将此脚本提交到集群

我得到两个文件一个输出文件和一个错误文件:

错误文件内容如下:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
Traceback (most recent call last):
File "homefolder/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
MRWordFreqCount.run()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 206, in run_job
with self.make_runner() as runner:
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/job.py", line 541, in make_runner
return super(MRJob, self).make_runner()
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/launch.py", line 164, in make_runner
return HadoopJobRunner(**self.hadoop_job_runner_kwargs())
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 179, in __init__
super(HadoopJobRunner, self).__init__(**kwargs)
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/runner.py", line 352, in __init__
self._opts = self.OPTION_STORE_CLASS(self.alias, opts, conf_paths)
File "/homefolder/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 132, in __init__
'you must set $HADOOP_HOME, or pass in hadoop_home explicitly')
Exception: you must set $HADOOP_HOME, or pass in hadoop_home explicitly

第一个问题我如何找到$HADOOP HOME?当我执行 echo $HADOOP_HOME 时,没有打印任何内容,这意味着它没有设置。因此,即使我必须设置它,我必须将它设置为什么路径?是否应该设置为Hadoop name_node在集群中的路径?

第二个问题“未找到配置”错误表示什么?它是否与未设置 $HADOOP_HOME 有关,或者它是否期望显式传入一些其他配置文件?

非常感谢任何帮助。

提前致谢!

最佳答案

首先,$HADOOP_HOME应该设置为你机器的本地hadoop安装路径,几乎所有的hadoop应用程序都假设$HADOOP_HOME/bin/hadoop是 hadoop 可执行文件。所以如果你在系统默认路径安装你的hadoop,你应该export HADOOP_HOME=/usr/,否则你应该export HADOOP_HOME=/path/to/hadoop

其次,您可以为 mrjob 提供特定的配置,如果没有,mrjob 将使用自动配置。在大多数情况下,提供 HADOOP_HOME 并使用自动配置就可以了,对于高级用户,请参阅 http://pythonhosted.org/mrjob/guides/configs-basics.html

关于python - 在 hadoop 集群上运行时出现 MRJob 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20589431/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com