gpt4 book ai didi

apache-spark - PySpark 在 YARN 客户端模式下运行,但在 "User did not initialize spark context!"的集群模式下失败

转载 作者:行者123 更新时间:2023-12-05 05:49:27 32 4
gpt4 key购买 nike

  • 标准数据处理图像 2.0
  • Ubuntu 18.04 LTS
  • Hadoop 3.2
  • Spark 3.1

我正在测试在 dataproc pyspark 集群上运行一个非常简单的脚本:

测试部门.py

import os
os.listdir('./')

我可以在客户端模式下运行 testing_dep.py(dataproc 上的默认模式):

gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1

但是,当我尝试在集群模式下运行相同的作业时,出现错误:

gcloud dataproc jobs submit pyspark ./testing_dep.py --cluster=pyspark-monsoon --region=us-central1 --properties=spark.submit.deployMode=cluster

错误日志:

Job [417443357bcd43f99ee3dc60f4e3bfea] submitted.
Waiting for job output...
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at monsoon-testing-m/10.128.15.236:8032
22/01/12 05:32:20 INFO org.apache.hadoop.yarn.client.AHSProxy: Connecting to Application History server at monsoon-testing-m/10.128.15.236:10200
22/01/12 05:32:22 INFO org.apache.hadoop.conf.Configuration: resource-types.xml not found
22/01/12 05:32:22 INFO org.apache.hadoop.yarn.util.resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/01/12 05:32:24 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1641965080466_0001
22/01/12 05:32:42 ERROR org.apache.spark.deploy.yarn.Client: Application diagnostics message: Application application_1641965080466_0001 failed 2 times due to AM Container for appattempt_1641965080466_0001_000002 exited with exitCode: 13
Failing this attempt.Diagnostics: [2022-01-12 05:32:42.154]Exception from container-launch.
Container id: container_1641965080466_0001_02_000001
Exit code: 13

[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


[2022-01-12 05:32:42.203]Container exited with a non-zero exit code 13. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
22/01/12 05:32:40 ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception:
java.lang.IllegalStateException: User did not initialize spark context!
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:520)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:899)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:898)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:898)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)


For more detailed output, check the application tracking page: http://monsoon-testing-m:8188/applicationhistory/app/application_1641965080466_0001 Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1641965080466_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1242)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1634)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.pyspark) Job [417443357bcd43f99ee3dc60f4e3bfea] failed with error:
Google Cloud Dataproc Agent reports job failure. If logs are available, they can be found at:
https://console.cloud.google.com/dataproc/jobs/417443357bcd43f99ee3dc60f4e3bfea?project=monsoon-credittech&region=us-central1
gcloud dataproc jobs wait '417443357bcd43f99ee3dc60f4e3bfea' --region 'us-central1' --project 'monsoon-credittech'
https://console.cloud.google.com/storage/browser/monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/
gs://monsoon-credittech.appspot.com/google-cloud-dataproc-metainfo/64632294-3e9b-4c55-af8a-075fc7d6f412/jobs/417443357bcd43f99ee3dc60f4e3bfea/driveroutput



你能帮我理解我做错了什么以及为什么这段代码失败了吗?

最佳答案

在 YARN 集群模式下运行 Spark 时会出现错误,但作业不会创建 Spark 上下文。查看ApplicationMaster.scala的源代码.

为避免此错误,您需要创建一个 SparkContext 或 SparkSession,例如:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
.appName('MySparkApp') \
.getOrCreate()

客户端模式不经过相同的代码路径,也没有类似的检查。

关于apache-spark - PySpark 在 YARN 客户端模式下运行,但在 "User did not initialize spark context!"的集群模式下失败,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70668449/

32 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com