gpt4 book ai didi

java - 无法使用java中的spark-redshift库连接到S3

转载 作者:太空宇宙 更新时间:2023-11-04 09:54:21 24 4
gpt4 key购买 nike

我正在尝试基于 Spark 数据集在 Redshift 中创建一个表。我在 jdbc 中使用 Spark-Redshift 驱动程序在本地实现此目的。执行此操作的代码片段

data.write()
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://..")
.option("dbtable", "test_table")
.option("tempdir", "s3://temp")
.option("aws_iam_role", "arn:aws:iam::..")
.option("extracopyoptions", "region 'us-west-1'")
.mode(SaveMode.Append).save();

我的 maven pom.xml 有以下依赖项:

<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-redshift_2.11</artifactId>
<version>2.0.1</version>
</dependency>

我使用的是java 1.8。我收到以下错误:

java.io.IOException: No FileSystem for scheme: s3
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at com.databricks.spark.redshift.Utils$.assertThatFileSystemIsNotS3BlockFileSystem(Utils.scala:156)
at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:340)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at com.peak.spark.jobs.SparkDataIngestJob.writeData(SparkDataIngestJob.java:196)
at com.peak.spark.jobs.SparkDataIngestJob.exec(SparkDataIngestJob.java:123)
at com.peak.spark.core.AbstractSparkJob.run(AbstractSparkJob.java:74)
at com.peak.spark.core.SparkAppLauncher.onApplicationEvent(SparkAppLauncher.java:40)
at com.peak.spark.core.SparkAppLauncher.onApplicationEvent(SparkAppLauncher.java:16)
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:151)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:128)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:331)
at org.springframework.context.support.AbstractApplicationContext.start(AbstractApplicationContext.java:1174)
at com.peak.spark.core.SparkApp.launch(SparkApp.java:38)
at com.peak.spark.core.SparkApp.main(SparkApp.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Exception in thread "main" java.lang.RuntimeException: Spark Job FailedNo FileSystem for scheme: s3
at com.peak.spark.jobs.SparkDataIngestJob.exec(SparkDataIngestJob.java:162)
at com.peak.spark.core.AbstractSparkJob.run(AbstractSparkJob.java:74)
at com.peak.spark.core.SparkAppLauncher.onApplicationEvent(SparkAppLauncher.java:40)
at com.peak.spark.core.SparkAppLauncher.onApplicationEvent(SparkAppLauncher.java:16)
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:151)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:128)
at org.springframework.context.support.AbstractApplicationContext.publishEvent(AbstractApplicationContext.java:331)
at org.springframework.context.support.AbstractApplicationContext.start(AbstractApplicationContext.java:1174)
at com.peak.spark.core.SparkApp.launch(SparkApp.java:38)
at com.peak.spark.core.SparkApp.main(SparkApp.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


请帮我找出问题所在。

最佳答案

我想您忘记将 hadoop-aws 包包含到您的项目中。该软件包将允许您使用 s3:// 架构

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.6.0</version>
</dependency>

关于java - 无法使用java中的spark-redshift库连接到S3,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54361818/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com