gpt4 book ai didi

python - 使用 --pool-emr-job-flows 时,MRJob 无法在 EMR 上启动新作业

转载 作者:可可西里 更新时间:2023-11-01 16:59:09 26 4
gpt4 key购买 nike

我正在使用 MRJob 在 Amazon 的 EMR 上运行一个迭代的 hadoop 程序。

当我不使用“--pool-emr-job-flows”选项时,一切正常(但速度很慢)。当我使用这个选项时,

Traceback (most recent call last):
File "ic_bfs_eval.py", line 297, in <module>
res = main()
File "ic_bfs_eval.py", line 262, in main
frac, mr_rounds = bfs(db_name, T, samples, total_steps_cap)
File "ic_bfs_eval.py", line 183, in bfs
runner.run()
File "/Library/Python/2.7/site-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/runner.py", line 620, in __exit__
self.cleanup()
File "/Library/Python/2.7/site-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/emr.py", line 987, in cleanup
super(EMRJobRunner, self).cleanup(mode=mode)
File "/Library/Python/2.7/site-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/runner.py", line 566, in cleanup
self._cleanup_job()
File "/Library/Python/2.7/site-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/emr.py", line 1061, in _cleanup_job
self._opts['ec2_key_pair_file'])
File "/Library/Python/2.7/site-packages/mrjob-0.4.3_dev-py2.7.egg/mrjob/ssh.py", line 209, in ssh_terminate_single_job
num_jobs_match = HADOOP_JOB_LIST_NUM_RE.match(job_list_lines[0])
IndexError: list index out of range

我正在像这样初始化一个 MRJob:

mrJob2 = MRBFSSampleIter(args=["-c", "~/mrjob.conf",
"-r", "emr",
"--no-output",
"--output-dir", tmp_dir_out,
"--pool-emr-job-flows", tmp_dir_in])

关于为什么会发生这种情况有什么想法吗?

最佳答案

当我设置 ssh key 对时,这对我来说就消失了。我认为它仍然是一个错误,因为 ssh 应该是可选的。但最简单的解决方法是按照 http://mrjob.readthedocs.org/en/latest/guides/emr-quickstart.html#configuring-ssh-credentials 中所述设置 key 对。

关于python - 使用 --pool-emr-job-flows 时,MRJob 无法在 EMR 上启动新作业,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26654294/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com