gpt4 book ai didi

python - 使用 elastic-mapreduce 将文件加载到 EMR 分布式缓存时出错

转载 作者:可可西里 更新时间:2023-11-01 15:32:44 26 4
gpt4 key购买 nike

我正在使用以下命令启动集群。

./elastic-mapreduce --create \
--stream \
--cache s3n://bucket_name/code/totalInstallUsers#totalInstallUsers \
--input s3n://bucket_name/input \
--output s3n://bucket_name/output \
--mapper s3n://bucket_name/code/mapper.py \
--reducer s3n://bucket_name \
--jobflow-role EMR_EC2_DefaultRole \
--service-role EMR_DefaultRole \
--debug \
--log-uri s3n://bucket_name/logs

而且我总是收到以下错误消息。如果我删除 --cache 语句,集群将成功启动。错误:未定义的方法 each' for #<String:0x00000002c28ba0>
/home/ubuntu/data_processing/commands.rb:806:in
脚步'/home/ubuntu/data_processing/commands.rb:1232:in block in enact'
/home/ubuntu/data_processing/commands.rb:1232:in
map '/home/ubuntu/data_processing/commands.rb:1232:in enact'
/home/ubuntu/data_processing/commands.rb:49:in
阻止制定'/home/ubuntu/data_processing/commands.rb:49:in each'
/home/ubuntu/data_processing/commands.rb:49:in
制定'/home/ubuntu/data_processing/commands.rb:2422:in create_and_execute_commands'
/home/ubuntu/data_processing/elastic-mapreduce-cli.rb:13:in
'/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in require'
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in
要求'./elastic-mapreduce:6:在`'

使用 --cache 的原因是我希望从 mapper.py 我可以通过“with open('./totalInstallUsers', 'r') as infile 打开数据文件:

谁能给我一个线索?谢谢

最佳答案

这里贴出我得到的解决方案,希望对其他人有帮助。使用 AWS EMR,命令如下所示:

aws emr create-cluster 
--name "cluster--name"
--enable-debugging
--log-uri s3://bucket-name/logs
--ami-version 3.7.0
--use-default-roles
--ec2-attributes KeyName=your-key
--instance-type m3.xlarge
--instance-count 3
--auto-terminate
--steps file://./streaming.json

And in Streaming.json, it looks like:
[
{
"Type": "STREAMING",
"Name": "Streaming program",
"ActionOnFailure": "TERMINATE_CLUSTER",
"Args": [
"-files","s3://bucket-name/code/mapper.py,s3://bucket-name/code/reducer.py",
"-mapper","mapper.py",
"-reducer","reducer.py",
"-input","s3://bucket-name/input",
"-output","s3://bucket-name/output",
"-cacheFile", "s3://bucket_name/code/data-file-name#new-file-name"
]
}
]

关于python - 使用 elastic-mapreduce 将文件加载到 EMR 分布式缓存时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30113961/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com