gpt4 book ai didi

Java/Scala 远程 HDFS 使用

转载 作者:可可西里 更新时间:2023-11-01 15:07:03 34 4
gpt4 key购买 nike

我正在尝试连接到远程 HDFS 集群。我已经阅读了一些文档并开始使用,但没有找到如何做到这一点的最佳解决方案。情况:我在 xxx-something.com 上有 HDFS。我可以通过 SSH 连接到它,一切正常。

但我想做的是,将文件从它获取到我的本地机器。

我做了什么:

我已经在我的 conf 文件夹中创建了 core-site.xml(我正在创建 Play! 应用程序)。我已经将 fs.default.name 配置更改为 hdfs://xxx-something.com:8020(不确定端口)。然后我尝试启动一个简单的测试:

val conf = new Configuration()
conf.addResource(new Path("conf/core-site.xml"))
val fs = FileSystem.get(conf)
val status = fs.listStatus(new Path("/data/"))

我收到错误:

 13:56:09.012 [specs2.DefaultExecutionStrategy1] WARN  org.apache.hadoop.conf.Configuration - conf/core-site.xml:a attempt to override final parameter: fs.trash.interval;  Ignoring.
13:56:09.012 [specs2.DefaultExecutionStrategy1] WARN org.apache.hadoop.conf.Configuration - conf/core-site.xml:a attempt to override final parameter: hadoop.tmp.dir; Ignoring.
13:56:09.013 [specs2.DefaultExecutionStrategy1] WARN org.apache.hadoop.conf.Configuration - conf/core-site.xml:a attempt to override final parameter: fs.checkpoint.dir; Ignoring.
13:56:09.022 [specs2.DefaultExecutionStrategy1] DEBUG org.apache.hadoop.fs.FileSystem - Creating filesystem for hdfs://xxx-something.com:8021
13:56:09.059 [specs2.DefaultExecutionStrategy1] DEBUG org.apache.hadoop.conf.Configuration - java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:226)
at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:213)
at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:53)
at org.apache.hadoop.net.NetUtils.<clinit>(NetUtils.java:62)

提前致谢!

更新:可能端口错了。现在我将它设置为 22,我仍然遇到同样的错误,但在 3 次之后它确实说:

14:01:01.877 [specs2.DefaultExecutionStrategy1] DEBUG org.apache.hadoop.ipc.Client - Connecting to xxx-something.com/someIp:22
14:01:02.187 [specs2.DefaultExecutionStrategy1] DEBUG org.apache.hadoop.ipc.Client - IPC Client (47) connection to xxx-something.com/someIp:22 from britva sending #0
14:01:02.188 [IPC Client (47) connection to xxx-something.com/someIp:22 from britva] DEBUG org.apache.hadoop.ipc.Client - IPC Client (47) connection to xxx-something.com/someIp:22 from britva: starting, having connections 1
14:01:02.422 [IPC Client (47) connection to xxx-something.com/someIp:22 from britva] DEBUG org.apache.hadoop.ipc.Client - IPC Client (47) connection to xxx-something.com/someIp:22 from britva got value #1397966893

之后:

Call to xxx-something.com/someIp:22 failed on local exception: java.io.EOFException
java.io.IOException: Call to xxx-something.com/someIp:22 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1103)
at org.apache.hadoop.ipc.Client.call(Client.java:1071)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:118)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:222)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:187)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1328)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1346)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122)
at HdfsSpec$$anonfun$1$$anonfun$apply$3.apply(HdfsSpec.scala:33)
at HdfsSpec$$anonfun$1$$anonfun$apply$3.apply(HdfsSpec.scala:17)
at testingSupport.specs2.MyNotifierRunner$$anon$2$$anon$1.executeBody(MyNotifierRunner.scala:16)
at testingSupport.specs2.MyNotifierRunner$$anon$2$$anon$1.execute(MyNotifierRunner.scala:16)
Caused by: java.io.EOFException
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:807)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:745)

这是什么意思?

最佳答案

您需要在运行名称节点(HDFS 主服务器)的服务器上的 $HADOOP_HOME/conf/core-site.xml 中找到 fs.default.name 属性以获得正确的端口。它可能是 8020,也可能是其他东西。这就是你应该使用的。确保您和服务器之间没有防火墙禁止端口上的连接。

关于Java/Scala 远程 HDFS 使用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15293460/

34 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com