gpt4 book ai didi

java - Apache Spark 错误使用 hadoop 将数据卸载到 AWS S3

转载 作者:可可西里 更新时间:2023-11-01 15:49:33 25 4
gpt4 key购买 nike

我正在使用 Apache Spark v2.3.1 并尝试在处理后将数据卸载到 AWS S3。类似的东西:

data.write().parquet("s3a://"+ bucketName + "/"+ location);

配置似乎没问题:

        String region = System.getenv("AWS_REGION");
String accessKeyId = System.getenv("AWS_ACCESS_KEY_ID");
String secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY");

spark.sparkContext().hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsRegion", region);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsAccessKeyId", accessKeyId);
spark.sparkContext().hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", secretAccessKey);

%HADOOP_HOME% 导致与 Spark (v2.6.5) 使用的版本完全相同并添加到路径中:

C:\>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
credential interact with credential providers
key manage keys via the KeyProvider
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME

Maven 也是如此:

    <dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.6.5</version>
</dependency>

但是我在写入时仍然出现以下错误。有什么想法吗?

Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) ~[hadoop-common-2.6.5.jar:?]
at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:557) ~[hadoop-common-2.6.5.jar:?]
at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:977) ~[hadoop-common-2.6.5.jar:?]

最佳答案

是的,我错过了一步。把这个:https://github.com/steveloughran/winutils/tree/master/hadoop-2.6.4/bin%HADOOP_HOME%\bin。即使版本不匹配(v2.6.5 与 v2.6.4),这似乎仍然有效。

关于java - Apache Spark 错误使用 hadoop 将数据卸载到 AWS S3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51533335/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com