gpt4 book ai didi

apache-flink - flink-conf.yaml 中 fs.hdfs.hadoopconf 的用途

转载 作者:行者123 更新时间:2023-12-02 22:44:49 25 4
gpt4 key购买 nike

Flink 新手。
我可以在远程 hdfs 集群中的文件上运行示例 wordcount.jar,而无需在 flink conf 中声明 fs.hdfs.hadoopconf 变量。

所以想知道上述变量的用途到底是什么。
声明它会改变运行示例 jar 的方式吗?

命令:

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs://hadoop-master:9000/tmp/test-events

输出:

.......
07/13/2016 00:50:13 Job execution switched to status FINISHED.
(foo,1)
.....
(bar,1)
(one,1)

设置:

  • hdfs://hadoop-master.vm:9000 上的远程 HDFS 集群
  • 在 flink-cluster.vm 上运行的 Flink 集群

谢谢

更新:
正如 Serhiy 所指出的,在conf中声明了fs.hdfs.hadoopconf,但在使用更新的参数hdfs:///tmp/test-events.1468374669125运行作业时出现以下错误

flink-conf.yaml

# You can also directly specify the paths to hdfs-default.xml and hdfs-site.xml
# via keys 'fs.hdfs.hdfsdefault' and 'fs.hdfs.hdfssite'.
#
fs.hdfs.hadoopconf: hdfs://hadoop-master:9000/
fs.hdfs.hdfsdefault : hdfs://hadoop-master:9000/

命令:

flink-cluster.vm ~]$ /opt/flink/bin/flink run  /opt/flink/examples/batch/WordCount.jar --input hdfs:///tmp/test-events

输出:

Caused by: org.apache.flink.runtime.JobException: Creating the input splits caused an error: The given HDFS file URI (hdfs:///tmp/test-events.1468374669125) did not describe the HDFS NameNode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default file system was registered, or the provided configuration contains no valid authority component (fs.default.name or fs.defaultFS) describing the (hdfs namenode) host and port.
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:172)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:679)
at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:1026)
... 19 more

最佳答案

来自documentation :

fs.hdfs.hadoopconf: The absolute path to the Hadoop File System’s (HDFS) configuration directory (OPTIONAL VALUE). Specifying this value allows programs to reference HDFS files using short URIs (hdfs:///path/to/files, without including the address and port of the NameNode in the file URI). Without this option, HDFS files can be accessed, but require fully qualified URIs like hdfs://address:port/path/to/files. This option also causes file writers to pick up the HDFS’s default values for block sizes and replication factors. Flink will look for the “core-site.xml” and “hdfs-site.xml” files in the specified directory.

关于apache-flink - flink-conf.yaml 中 fs.hdfs.hadoopconf 的用途,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38341401/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com