gpt4 book ai didi

amazon-web-services - 使用 lambda 函数通过 spark 步骤创建 AWS EMR 集群失败并显示 "Local file does not exist"

转载 作者:行者123 更新时间:2023-12-05 03:05:02 24 4
gpt4 key购买 nike

我正在尝试使用 Lambda 函数通过 Spark 步骤启动 EMR 集群。

这是我的 lambda 函数(python 2.7):

import boto3

def lambda_handler(event, context):
conn = boto3.client("emr")
cluster_id = conn.run_job_flow(
Name='LSR Batch Testrun',
ServiceRole='EMR_DefaultRole',
JobFlowRole='EMR_EC2_DefaultRole',
VisibleToAllUsers=True,
LogUri='s3n://aws-logs-171256445476-ap-southeast-2/elasticmapreduce/',
ReleaseLabel='emr-5.16.0',
Instances={
"Ec2SubnetId": "<my-subnet>",
'InstanceGroups': [
{
'Name': 'Master nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'MASTER',
'InstanceType': 'm3.xlarge',
'InstanceCount': 1,
},
{
'Name': 'Slave nodes',
'Market': 'ON_DEMAND',
'InstanceRole': 'CORE',
'InstanceType': 'm3.xlarge',
'InstanceCount': 2,
}
],
'KeepJobFlowAliveWhenNoSteps': False,
'TerminationProtected': False
},
Applications=[{
'Name': 'Spark',
'Name': 'Hive'
}],
Configurations=[
{
"Classification": "hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
}
],
Steps=[{
'Name': 'mystep',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 's3://elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [
"/home/hadoop/spark/bin/spark-submit", "--deploy-mode", "cluster",
"--master", "yarn-cluster", "--class", "org.apache.spark.examples.SparkPi",
"s3://support.elasticmapreduce/spark/1.2.0/spark-examples-1.2.0-hadoop2.4.0.jar", "10"
]
}
}],
)
return "Started cluster {}".format(cluster_id)

集群正在启动,但在尝试执行该步骤时失败了。错误日志包含以下异常:

Exception in thread "main" java.lang.RuntimeException: Local file does not exist.
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:30)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

所以脚本运行器似乎不理解从 S3 获取 .jar 文件?

感谢任何帮助...

最佳答案

我最终可以解决问题。主要问题是损坏的“应用程序”配置,它必须如下所示:

Applications=[{
'Name': 'Spark'
},
{
'Name': 'Hive'
}],

最后的 Steps 元素:

   Steps=[{
'Name': 'lsr-step1',
'ActionOnFailure': 'TERMINATE_CLUSTER',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': [
"spark-submit", "--class", "org.apache.spark.examples.SparkPi",
"s3://support.elasticmapreduce/spark/1.2.0/spark-examples-1.2.0-hadoop2.4.0.jar", "10"
]
}
}]

关于amazon-web-services - 使用 lambda 函数通过 spark 步骤创建 AWS EMR 集群失败并显示 "Local file does not exist",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51549234/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com