gpt4 book ai didi

amazon-web-services - AWS EMR 在加速端点配置上抛出异常

转载 作者:可可西里 更新时间:2023-11-01 15:09:20 25 4
gpt4 key购买 nike

这是我使用的 EMR 步骤,

s3-dist-cp --targetSize 1000 --outputCodec=gz --s3Endpoint=bucket.s3-accelerate.amazonaws.com --groupBy './(\d\d)/\d\d/\d\d/.' --src s3a://sourcebucket/ --dest s3a://destbucket/

加速端点异常。

电子病历版本:

Release label:emr-5.13.0
Hadoop distribution:Amazon 2.8.3
Applications:Hive 2.3.2, Pig 0.17.0, Hue 4.1.0, Presto 0.194

我缺少什么来为 s3-dist-cp 传递参数来克服这个错误?

Exception in thread "main" com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.get(LocalCache.java:3937)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4830)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:55)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider.getS3(DefaultS3Provider.java:22)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.getClient(GlobalS3Executor.java:122)
at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:89)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:176)
at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.doesBucketExist(AmazonS3LiteClient.java:88)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.java:138)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:116)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.initialize(S3NativeFileSystem.java:448)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.java:109)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2859)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2896)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2878)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:392)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:869)
at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:705)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.IllegalStateException: To enable accelerate mode, please use AmazonS3ClientBuilder.withAccelerateModeEnabled(true)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.setEndpoint(AmazonS3Client.java:670)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonWebServiceClient.withEndpoint(AmazonWebServiceClient.java:897)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:62)
at com.amazon.ws.emr.hadoop.fs.s3.lite.provider.DefaultS3Provider$S3CacheLoader.load(DefaultS3Provider.java:58)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
at com.amazon.ws.emr.hadoop.fs.shaded.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
... 30 more
Command exiting with ret '1'

最佳答案

s3-dist-cp 构建于 hadoop-aws 库之上,该库不支持开箱即用的加速存储桶。

您想制作自己的依赖于 hadoop-aws 和 amazon-sdk-s3 的 jar,在那里转换所需的参数并扩展 s3ClientFactory 以启用加速上传。

Maven 依赖示例:

<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>

S3 客户端工厂:

public class AcceleratedS3ClientFactory extends DefaultS3ClientFactory {
@Override
protected AmazonS3 newAmazonS3Client(AWSCredentialsProvider credentials, ClientConfiguration awsConf) {
AmazonS3ClientBuilder s3Builder = AmazonS3ClientBuilder
.standard()
.withRegion("s3-accelerate.amazonaws.com")
.enableAccelerateMode();
s3Builder.setCredentials(credentials);
s3Builder.setClientConfiguration(awsConf);

return s3Builder.build();
}

@Override
public AmazonS3 createS3Client(URI name) throws IOException {
AmazonS3 s3 = super.createS3Client(name);
// load below bucket name from step configuration as well
s3.setBucketAccelerateConfiguration("bucket-name",
new BucketAccelerateConfiguration(BucketAccelerateStatus.Enabled));

return s3;
}
}

最后一步是为 hadoop 提供 s3 工厂类:

<property>
<name>fs.s3a.s3.client.factory.impl</name>
<value>example_package.AcceleratedS3ClientFactory</value>
</property>

这也可以通过命令行完成,因此您可以直接在 EMR 界面或 EMR SDK 中指定它。

对于复制本身,可以使用 Hadoop FileUtil.copy API,您可以在那里指定源和目标,以及所需的配置。

对于某些特定的文件格式,或者不是基于 FS 的源或目标,可以考虑在此实用程序之上使用 Spark。在某些情况下,它可以加快传输速度。

现在您可以将带有 jar 的步骤发送到 EMR:

aws emr add-steps --cluster-id cluster_id \
--steps Type=CUSTOM_JAR,Name="a step name",Jar=s3://app/my-s3distcp-1.0.jar,\
Args=["key","value"]

将所有必需的参数放入 Args,例如源和目标 s3 路径。

注意:不要指定 hadoop-aws 支持的存储桶特定端点。它以与加速不兼容的方式使用它,每次都会得到相同的异常。

链接:

关于amazon-web-services - AWS EMR 在加速端点配置上抛出异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50554493/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com