gpt4 book ai didi

python - Spark 簇错误: ClassNotFoundException

转载 作者:可可西里 更新时间:2023-11-01 14:51:54 26 4
gpt4 key购买 nike

我使用 spark 框架处理大数据、hadoop 文件系统和集群管理器 YARN。当我尝试使用命令 spark-submit --deploy-mode cluster --master yarn streaming.py 运行我的 python 应用程序时我收到一个错误:

 16/12/19 15:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedAvatarFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:123)
at org.apache.spark.deploy.yarn.Client$$anonfun$7.apply(Client.scala:123)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:123)
at org.apache.spark.deploy.yarn.Client.<init>(Client.scala:70)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1178)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedAvatarFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 23 more

在 Spark Standalone 模式下一切正常。


我的配置:

yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>CapacityScheduler</value>
</property>

</configuration>

hdfs-site.xml

<configuration>

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.http.address</name>
<value>127.0.0.1:50070</value>
</property>

<property>
<name>dfs.secondary.http.address</name>
<value>0.0.0.0:0</value>
</property>

<property>
<name>dfs.blockreport.intervalMsec</name>
<value>300000</value>
</property>

<property>
<name>dfs.fullblockreport.magnifier</name>
<value>2</value>
</property>

<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:0</value>
</property>

<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:0</value>
</property>

<property>
<name>dfs.datanode.ipc.address</name>
<value>0.0.0.0:0</value>
</property>

<property>
<name>dfs.datanode.handler.count</name>
<value>3</value>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>{{DataNode-volumes}}</value>
</property>

<property>
<name>dfs.block.invalidate.limit</name>
<value>100</value>
</property>

<property>
<name>dfs.safemode.extension</name>
<value>10000</value>
</property>

<property>
<name>dfs.namenode.dn-address</name>
<value>0.0.0.0:9015</value>
</property>

</configuration>

核心站点.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
</property>

<property>
<name>fs.default.name0</name>
<value>hdfs://0.0.0.0:9000</value>
</property>

<property>
<name>fs.default.name1</name>
<value>hdfs://0.0.0.0:9010</value>
</property>

<property>
<name>fs.checkpoint.period</name>
<value>600</value>
</property>

<property>
<name>fs.checkpoint.size</name>
<value>10000000</value>
</property>

<property>
<name>fs.ha.zookeeper.quorum</name>
<value>{{zookeeper-quorum}}</value>
</property>

<property>
<name>ipc.client.connect.max.retries</name>
<value>10</value>
</property>

<property>
<name>ipc.client.connect.timeout</name>
<value>5</value>
</property>

<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedAvatarFileSystem</value>
</property>

<property>
<name>fs.ha.zookeeper.cache</name>
<value>true</value>
</property>

<property>
<name>fs.ha.zookeeper.timeout</name>
<value>30000</value>
</property>

<property>
<name>fs.ha.retrywrites</name>
<value>true</value>
</property>
</configuration>

最佳答案

在我的案例中帮助更换

核心站点.xml

<property>
<name>fs.hdfs.impl</name>
<value>org.apache.hadoop.hdfs.DistributedAvatarFileSystem</value>
</property>

DistributedAvatarFileSystem

DistributedFileSystem

关于python - Spark 簇错误: ClassNotFoundException,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41224782/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com