gpt4 book ai didi

python - 使用 MRJob 更改 Mapreduce 中间输出位置

转载 作者:可可西里 更新时间:2023-11-01 16:16:13 30 4
gpt4 key购买 nike

我正在尝试在我没有管理员权限的集群上使用 MRJob 运行 python 脚本,我在下面粘贴了错误。我认为正在发生的事情是该作业正在尝试将中间文件写入默认的/tmp.... 目录,并且由于这是一个我无权写入的 protected 目录,该作业收到一个错误并且导出。我想知道如何将此 tmp 输出目录位置更改为本地文件系统示例中的某个位置:/home/myusername/some_path_in_my_local_filesystem_on_the_cluster ,基本上我想知道我必须传递哪些附加参数才能将中间输出位置从/tmp/... 更改为我具有写权限的本地某个地方.

我将我的脚本调用为:

python myscript.py  input.txt -r hadoop > output.txt

错误:

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232
writing wrapper script to /tmp/13435.1.all.q/mr_word_freq_count.myusername.20131215.004905.274232/setup-wrapper.sh
STDERR: mkdir: org.apache.hadoop.security.AccessControlException: Permission denied: user=myusername, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Traceback (most recent call last):
File "/home/myusername/privatemodules/python/examples/mr_word_freq_count.py", line 37, in <module>
MRWordFreqCount.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
mr_job.execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
super(MRJob, self).execute()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
self.run_job()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job
runner.run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run
self._run()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run
self._upload_local_files_to_hdfs()
File "/home/myusername/.local/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
self._mkdir_on_hdfs(self._upload_mgr.prefix)

最佳答案

您是将 mrjob 作为“本地”作业运行,还是尝试在您的 Hadoop 集群上运行它?

如果您真的想在 Hadoop 上使用它,您可以使用 --base-tmp-dir 标志控制“scratch”HDFS 位置(mrjob 将存储中间文件的位置):

python mr.py -r hadoop -o hdfs:///user/you/output_dir --base-tmp-dir hdfs:///user/you/tmp  hdfs:///user/you/data.txt

关于python - 使用 MRJob 更改 Mapreduce 中间输出位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20590110/

30 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com