gpt4 book ai didi

scala - 来自Scala的hdfs连接错误

转载 作者:行者123 更新时间:2023-12-02 20:55:53 25 4
gpt4 key购买 nike

我是hadoop的新手,刚开始尝试使用scala和spark连接到hdfs,但不知道配置出了什么问题。请帮助我解决并理解它。

Hadoop Version is 2.7.3
Scala Version is 2.12.1
Spark Version is 2.1.1

pom.xml(依赖项)
        <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>

Scala代码库
object SparkHDFS {
def getDataFromHdfs {
val hdfs = FileSystem.get(new URI("hdfs://localhost:9000"), new Configuration)
val file = new Path("rdd/insurance.csv")
val stream = hdfs open file
println(stream.readLine())
}

def main(arr: Array[String]) {
getDataFromHdfs
}
}

控制台上的异常:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.hdfs.DistributedFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2400)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2411)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at com.sample.sparkscala.SparkHDFS$.getDataFromHdfs(SparkHDFS.scala:11)
at com.sample.sparkscala.SparkHDFS$.main(SparkHDFS.scala:18)
at com.sample.sparkscala.SparkHDFS.main(SparkHDFS.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration$DeprecationDelta
at org.apache.hadoop.hdfs.HdfsConfiguration.addDeprecatedKeys(HdfsConfiguration.java:66)
at org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>(HdfsConfiguration.java:31)
at org.apache.hadoop.hdfs.DistributedFileSystem.<clinit>(DistributedFileSystem.java:116)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration$DeprecationDelta
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 21 more

最佳答案

那不是您在spark中读取文件的方式。 Spark为csv文件提供了内置支持。

遵循一些有关在Spark中从hdfs读取文件的教程。

这是读取Spark中的csv文件的简单示例

val df = spark.read
.format("csv")
.option("header", "true") //reading the headers
.option("mode", "DROPMALFORMED")
.csv(""hdfs://localhost:9000")

您还需要使用Scala 2.11.x而不是2.12.x

希望这可以帮助!

关于scala - 来自Scala的hdfs连接错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44541702/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com