gpt4 book ai didi

apache-spark - AWS EKS Spark 3.0,Hadoop 3.2错误-NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException

转载 作者:行者123 更新时间:2023-12-02 20:14:52 27 4
gpt4 key购买 nike

我正在EKS上运行Jupyterhub,并想利用EKS IRSA功能在K8上运行Spark工作负载。我曾经使用过Kube2IAM,但是现在我打算迁移到IRSA。
此错误不是由于IRSA引起的,因为服务帐户已完全连接到Driver和Executor pods ,并且我可以通过CLI和SDK从两者访问S3。此问题与在Spark 3.0 / Hadoop 3.2上使用Spark访问S3有关
Py4JJavaError:调用None.org.apache.spark.api.java.JavaSparkContext时发生错误。 :java.lang.NoClassDefFoundError:com / amazonaws / services / s3 / model / MultiObjectDeleteException
我正在使用以下版本-

  • APACHE_SPARK_VERSION = 3.0.1
  • HADOOP_VERSION = 3.2
  • aws-java-sdk-1.11.890
  • hadoop-aws-3.2.0
  • Python 3.7.3

  • 我也测试了不同的版本。
  • aws-java-sdk-1.11.563.jar

  • 如果有人遇到此问题,请提供解决方案。
    PS:这也不是IAM策略错误,因为IAM策略非常好。

    最佳答案

    最后,所有问题都可以通过下面的 jar 解决-

  • hadoop-aws-3.2.0.jar
  • aws-java-sdk-bundle-1.11.874.jar(https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-bundle/1.11.874)

  • 任何尝试使用IRSA在EKS上运行Spark的人都是正确的Spark配置-
    from pyspark.sql import SparkSession

    spark = SparkSession.builder \
    .appName("pyspark-data-analysis-1") \
    .config("spark.kubernetes.driver.master","k8s://https://xxxxxx.gr7.ap-southeast-1.eks.amazonaws.com:443") \
    .config("spark.kubernetes.namespace", "jupyter") \
    .config("spark.kubernetes.container.image", "xxxxxx.dkr.ecr.ap-southeast-1.amazonaws.com/spark-ubuntu-3.0.1") \
    .config("spark.kubernetes.container.image.pullPolicy" ,"Always") \
    .config("spark.kubernetes.authenticate.driver.serviceAccountName", "spark") \
    .config("spark.kubernetes.authenticate.executor.serviceAccountName", "spark") \
    .config("spark.kubernetes.executor.annotation.eks.amazonaws.com/role-arn","arn:aws:iam::xxxxxx:role/spark-irsa") \
    .config("spark.hadoop.fs.s3a.aws.credentials.provider", "com.amazonaws.auth.WebIdentityTokenCredentialsProvider") \
    .config("spark.kubernetes.authenticate.submission.caCertFile", "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") \
    .config("spark.kubernetes.authenticate.submission.oauthTokenFile", "/var/run/secrets/kubernetes.io/serviceaccount/token") \
    .config("spark.hadoop.fs.s3a.multiobjectdelete.enable", "false") \
    .config("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
    .config("spark.hadoop.fs.s3a.fast.upload","true") \
    .config("spark.executor.instances", "1") \
    .config("spark.executor.cores", "3") \
    .config("spark.executor.memory", "10g") \
    .getOrCreate()

    关于apache-spark - AWS EKS Spark 3.0,Hadoop 3.2错误-NoClassDefFoundError:com/amazonaws/services/s3/model/MultiObjectDeleteException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64625111/

    27 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com