python-3.x - 在 docker 容器输出中运行 AWS 胶水作业， "com.amazonaws.SdkClientException: Failed to connect to service endpoint:"-6ren

python-3.x - 在 docker 容器输出中运行 AWS 胶水作业， "com.amazonaws.SdkClientException: Failed to connect to service endpoint:"

转载作者：行者123 更新时间：2023-12-04 09:36:39

我正在使用 Docker 开发本地 AWS 粘合作业(使用 pyspark)。我有一个 python 文件 (song_data.py)，其中包含使用 GlueContext 类的 aws 胶水作业。当我在容器终端内运行 gluesparksubmit glue_etl_scripts/song_data.py --JOB-NAME test 以执行粘合作业脚本时，出现以下错误:

20/06/24 02:12:54 WARN EC2MetadataUtils: Unable to retrieve the requested metadata (/latest/dynamic/instance-identity/document). Failed to connect to service endpoint: 
com.amazonaws.SdkClientException: Failed to connect to service endpoint: 
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:100)
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:70)
        at com.amazonaws.internal.InstanceMetadataServiceResourceFetcher.readResource(InstanceMetadataServiceResourceFetcher.java:75)
        at com.amazonaws.internal.EC2ResourceFetcher.readResource(EC2ResourceFetcher.java:66)
        at com.amazonaws.util.EC2MetadataUtils.getItems(EC2MetadataUtils.java:402)
        at com.amazonaws.util.EC2MetadataUtils.getData(EC2MetadataUtils.java:371)
        at com.amazonaws.util.EC2MetadataUtils.getData(EC2MetadataUtils.java:367)
        at com.amazonaws.util.EC2MetadataUtils.getEC2InstanceRegion(EC2MetadataUtils.java:282)
        at com.amazonaws.regions.InstanceMetadataRegionProvider.tryDetectRegion(InstanceMetadataRegionProvider.java:59)
        at com.amazonaws.regions.InstanceMetadataRegionProvider.getRegion(InstanceMetadataRegionProvider.java:50)
        at com.amazonaws.regions.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:46)
        at com.amazonaws.services.glue.util.EndpointConfig$.getConfig(EndpointConfig.scala:42)
        at com.amazonaws.services.glue.util.AWSConnectionUtils$.<init>(AWSConnectionUtils.scala:36)
        at com.amazonaws.services.glue.util.AWSConnectionUtils$.<clinit>(AWSConnectionUtils.scala)
        at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:152)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
        at sun.net.www.http.HttpClient.New(HttpClient.java:339)
        at sun.net.www.http.HttpClient.New(HttpClient.java:357)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1226)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1205)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
        at com.amazonaws.internal.ConnectionUtils.connectToEndpoint(ConnectionUtils.java:52)
        at com.amazonaws.internal.EC2ResourceFetcher.doReadResource(EC2ResourceFetcher.java:80)
        ... 25 more
An error occurred while calling o28.getCatalogSource.
: java.lang.ExceptionInInitializerError
        at com.amazonaws.services.glue.GlueContext.getCatalogSource(GlueContext.scala:152)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to load region information from any provider in the chain
        at com.amazonaws.regions.AwsRegionProviderChain.getRegion(AwsRegionProviderChain.java:59)
        at com.amazonaws.services.glue.util.EndpointConfig$.getConfig(EndpointConfig.scala:42)
        at com.amazonaws.services.glue.util.AWSConnectionUtils$.<init>(AWSConnectionUtils.scala:36)
        at com.amazonaws.services.glue.util.AWSConnectionUtils$.<clinit>(AWSConnectionUtils.scala)
        ... 12 more

在粘合作业文件(song_data.py)中调用 glueContext.create_dynamic_frame.from_catalog() 方法时引发错误:

from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark import SQLContext
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from configparser import ConfigParser

config = ConfigParser()
config.read_file(open('/usr/local/src/config/aws.cfg'))

sc = SparkContext.getOrCreate()
hadoop_conf = sc._jsc.hadoopConfiguration()

hadoop_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3a.access.key", config.get('AWS', 'KEY'))
hadoop_conf.set("fs.s3a.secret.key", config.get('AWS', 'SECRET'))
hadoop_conf.set("fs.s3a.endpoint", "s3.us-west-2.amazonaws.com")

sql = SQLContext(sc)

glueContext = GlueContext(sql)

try:
    song_df = glueContext.create_dynamic_frame.from_catalog(
            database='sparkify',
            table_name='song_data')

    print ('Count: ', song_df.count())
    print('Schema: ')
    song_df.printSchema()
except Exception as e:
    print(e)

我试过:

将 Hadoop 配置 fs.s3a 更改为具有不同访问/ secret key 属性的 fs.s3:

hadoop_conf.set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoop_conf.set("fs.s3.awsAccessKeyId", config.get('AWS', 'KEY'))
hadoop_conf.set("fs.s3.awsSecretAccessKey", config.get('AWS', 'SECRET'))
hadoop_conf.set("fs.s3.endpoint", "s3.us-west-2.amazonaws.com")

使用 GlueContext 的 create_dynamic_frame_from_catalog() 方法代替 create_dynamic_frame.from_catalog():

song_df = glueContext.create_dynamic_frame_from_catalog(
             database='sparkify',
             table_name='song_data')

删除 Hadoop 端点配置:#hadoop_conf.set("fs.s3a.endpoint", "s3.us-west-2.amazonaws.com")

更新的尝试

将 song_data.py 更改为:

conf = (
    SparkConf()
        .set('spark.hadoop.fs.s3a.access.key', config.get('AWS', 'KEY'))
        .set('spark.hadoop.fs.s3a.secret.key', config.get('AWS', 'SECRET'))
        .set("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
)

sc = SparkContext(conf=conf)

spark = SparkSession(sc)

glueContext = GlueContext(spark)

try:
    print('Attempt 1:')
    song_df = glueContext.create_dynamic_frame.from_options(
            connection_type='s3',
            connection_options={"paths": [ "s3a://sparkify-dend-analytics"]},
            format='json')

    print ('Count: ', song_df.count())
    print('Schema: ')
    song_df.printSchema()
except Exception as e:
    print(e)

try:
    print('Attempt 2:')
    song_df = glueContext.create_dynamic_frame.from_catalog(
            database='sparkify',
            table_name='song_data')

    print ('Count: ', song_df.count())
    print('Schema: ')
    song_df.printSchema()
except Exception as e:
    print(e)

try:
    print('Attempt 3:')
    song_df = glueContext.create_dynamic_frame_from_catalog(
            database='sparkify',
            table_name='song_data')

    print ('Count: ', song_df.count())
    print('Schema: ')
    song_df.printSchema()
except Exception as e:
    print(e)

输出错误

尝试 1:

An error occurred while calling o37.getDynamicFrame.

: org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on
 sparkify-dend-analytics: com.amazonaws.AmazonClientException: No AWS 
Credentials provided by DefaultAWSCredentialsProviderChain : 
com.amazonaws.SdkClientException: Unable to load AWS credentials from any 
provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to 
load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or 
AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), 
SystemPropertiesCredentialsProvider: Unable to load AWS credentials from 
Java system properties (aws.accessKeyId and aws.secretKey), 
WebIdentityTokenCredentialsProvider: You must specify a value for roleArn
 and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@xxxxxxxx: 
profile file cannot be null, com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@xxxxxxxx: Failed
 to connect to service endpoint: ]

尝试 2:

EC2MetadataUtils: Unable to retrieve the requested metadata (/latest/dynamic/instance-identity/document). Failed to connect to service endpoint: 
com.amazonaws.SdkClientException: Failed to connect to service endpoint: 
......
Caused by: java.net.ConnectException: Connection refused (Connection refused)
......
Caused by: com.amazonaws.SdkClientException: Unable to load region information from any provider in the chain

尝试 3:

An error occurred while calling o32.getCatalogSource.
: java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.glue.util.AWSConnectionUtils$

最佳答案

在 Docker 容器中本地运行 Glue 作业无法访问 Glue 目录。

使用 s3 直接从目录读取数据，而不是从目录读取数据

from_options(connection_type, connection_options={}, format=None, format_options={}, transformation_ctx="")

查找相同的文档 here

更新:您收到区域错误，这在本地运行胶水时很常见。

尝试运行以下命令以提供您所在的区域，这用于初始化库并且它仍然在本地工作

export AWS_REGION=us-east-1

关于python-3.x - 在 docker 容器输出中运行 AWS 胶水作业， "com.amazonaws.SdkClientException: Failed to connect to service endpoint:"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/62546743/

文章推荐： node.js - 注意插入新的子数组

文章推荐： javascript - 删除不起作用(保存后无法删除数据)

aws-sdk - AWS SDK 与 AWS CLI - AWS 云形成 - Terraform
对于在 AWS 云中配置基础设施，我们目前使用从 ansible 角色调用的云形成模板，但我们发现在增加基础设施的规模后，此代码在 GitHub 中变得非结构化或未模块化 Github上有意大利面条式
aws-cloudformation - AWS Cloudformation 创建 AWS Cloudwatch 事件以触发 AWS Batch
我一直在阅读documentation for AWS Cloudwatch events至trigger AWS Batch我不知道如何从 cloudwatch 事件触发 aws 批处理: 在 aw
AWS EKS aws-load-balancer-controller(AWS EKS AWS-负载平衡器控制器)
我正在尝试使用入口控制器安装我的CA证书。我正在遵循这份指南。Https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-co
aws-cloudformation - 如何使用 aws cloudformation 或 aws cdk 设置 aws aurora mysql 表？
如何使用 aws cloudformation 或 aws cdk 设置 aws aurora mysql 表？在我的设置中，我有一个使用 lambda 实现各种微服务的无服务器应用程序。数据库是无
typescript - aws-cdk-lib vs @aws-cdk/core, @aws-cdk/aws-iam, ... 的目的是什么？
我看到了各种使用 AWS CDK 的示例，其中一些使用 aws-cdk-lib，另一些使用 @aws-cdk/core。这些之间有什么区别，什么时候应该使用一个或另一个？最佳答案 aws-cdk-l
typescript - aws-cdk-lib vs @aws-cdk/core, @aws-cdk/aws-iam, ... 的目的是什么？
我看到了各种使用 AWS CDK 的示例，其中一些使用 aws-cdk-lib，另一些使用 @aws-cdk/core。这些之间有什么区别，什么时候应该使用一个或另一个？最佳答案 aws-cdk-l
aws-lambda - AWS Lambda 是否支持 aws-sdk v3？
我在 cdk 研讨会上建立了一个小的 lambda 函数 here .我正在用 typescript 编写 lambda 函数，通过管道进行部署，该管道创建了一个包含 lambda 函数的云形成堆栈。
aws-lambda - 如何在 AWS lambda 中使用 AWS KMS
我刚刚开始使用 AWS 服务，尤其是 AWS Lambda。有没有办法从 Lambda 代码 (Java) 中使用 AWS KMS 服务。我想使用 KMS 来解密加密的外化(从属性读取) secret
aws-cloudformation - AWS CloudFormation - AWS::ElasticLoadBalancingV2::LoadBalancer - 安全组
CFN 模板是否可以根据参数向 ALB 添加一些特定的安全组？我遇到了两个安全组添加到 ALB 的情况: ALB Type: AWS::ElasticLoadBalancingV2::LoadB
security - 一个 AWS 账户上的 AWS 安全组可以引用另一个 AWS 账户上的安全组吗？
例如，我有一个主要公司 AWS 账户，其安全组为 xxxxx。现在我有了我的个人 aws 安全组-yyyyy。这些帐户根本不相关。我可以将接受组-yyyyy 添加到组-xxxxx 中，从而允许我的
aws-lambda - AWS Lambda 的 AWS MSK 触发器 - 同一执行上下文中的多个主题
我有一个 Lambda 函数，它有多个 MSK 触发器配置 - 每个都针对不同的主题。如果 Lambda 的输入 ( MSKEvent ) 可以包含多个不同的主题，则未在官方文档中找到任何信息。官
aws-glue - 来自 AWS secret 管理器的 AWS Glue 连接
在 AWS Glue 中创建 JDBC 连接时，有什么方法可以从 AWS secret manager 获取密码而不是手动硬编码吗？最佳答案我必须在我当前的项目中这样做才能连接到 Cassandr
aws-appsync - : aws-sdk/clients/appsync and aws-appsync?有什么区别
谁能告诉我: aws-sdk/clients/appsync , 和 aws-appsync 根据文档，aws-sdk/clients/appsync使用是因为只包括 aws-sdk当我们只需要 ap
aws-amplify - 如何将现有的 AWS Amplify 后端导入本地的空 AWS Amplify 项目？
我不小心删除了我的放大前端并创建了一个新前端。如何将现有的放大后端导入新创建的放大应用项目文件夹？我按照后端标签上的步骤操作 amplify init --appId(“您的新AMPLIFY APP
aws-glue - 如何使用 AWS java SDK 使用 AWS 胶水作业自动生成脚本
我正在使用 Java Sdk 创建粘合作业。它只有两个必需的参数 Command 和 Glue 版本。但我需要使用自动脚本生成来创建工作。正如我们可以从控制台做的那样，我们添加数据源、AWS Glu
aws-lambda - 有没有办法在 AWS Glue 作业结束时触发 AWS Lambda 函数？
目前我正在使用 AWS Glue 作业将数据加载到 RedShift，但在加载之后我需要运行一些可能使用 AWS Lambda 函数的数据清理任务。有没有办法在 Glue 作业结束时触发 Lambda
aws-lambda - AWS lambda 和 AWS Lambda@EDGE 之间有什么区别？
简单的 aws lambda 和 aws lambda@edge 有什么区别？最佳答案 Lambda 根据某些触发器执行函数。 Lambda 的用例非常广泛，并且与许多 AWS 服务高度集成。您甚至
ruby-on-rails - AWS OpsWorks、AWS Beanstalk 与 AWS CloudFormation？
关闭。这个问题是opinion-based 。目前不接受答案。想要改进这个问题吗？更新问题，以便 editing this post 可以用事实和引文来回答它。 . 已关闭 9 个月前。社区 9
aws-cdk - 无法使用 python 使用 AWS-CDK 创建 AWS 管理的事件目录
我正在尝试使用 Python 使用 AWS-CDK 创建托管广告。以下是错误，从 JavaScriptError(resp.stack) 引发 JSIIError(resp.error)jsii.er
javascript - @aws-cdk/pipelines 和 @aws-cdk/aws-codepipeline 有什么区别？
这两个包似乎在很大程度上做同样的事情？这两个包之间的预期区别是什么，我应该使用哪个包？最佳答案 Pipelines 是较新的 --experimental-- (编辑:它不再在 Experiment

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

python-3.x - 在 docker 容器输出中运行 AWS 胶水作业， "com.amazonaws.SdkClientException: Failed to connect to service endpoint:"