gpt4 book ai didi

amazon-web-services - 从 Spark 集群上的 S3 读取 Spark 作业会出现 IllegalAccessError : tried to access method MutableCounterLong

转载 作者:可可西里 更新时间:2023-11-01 14:36:09 27 4
gpt4 key购买 nike

<分区>

我在 DC/OS 上有一个 Spark 集群,我正在运行一个从 S3 读取的 Spark 作业。版本如下:

  • 星火 2.3.1
  • Hadoop 2.7
  • AWS 连接依赖:"org.apache.hadoop"% "hadoop-aws"% "3.0.0-alpha2"

我通过执行以下操作读入数据:

`val hadoopConf = sparkSession.sparkContext.hadoopConfiguration
hadoopConf.set("fs.s3a.endpoint", Config.awsEndpoint)
hadoopConf.set("fs.s3a.access.key", Config.awsAccessKey)
hadoopConf.set("fs.s3a.secret.key", Config.awsSecretKey)
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")

val data = sparkSession.read.parquet("s3a://" + "path/to/file")

`我得到的错误是:

Exception in thread "main" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:194)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:215)
at org.apache.hadoop.fs.s3a.S3AInstrumentation.<init>(S3AInstrumentation.java:138)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:44)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:321)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:559)
at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:543)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:809)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:182)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:207)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

只有当我将它作为 JAR 提交到集群时,该作业才会失败。如果我在本地或 docker 容器中运行代码,它不会失败并且完全能够读取数据。

如果有人能帮助我,我将不胜感激!

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com