gpt4 book ai didi

java - Apache Flink 作业集群 rpc.address 绑定(bind)到 kubernetes 上的本地主机

转载 作者:行者123 更新时间:2023-12-02 02:58:46 25 4
gpt4 key购买 nike

我正在尝试在 kubernetes 环境中运行 Flink Job cluster(1.8.1)。我使用 this doc 使用我的 Job jar 创建了 docker 镜像.

已关注 kubefiles创建作业、作业管理器和任务管理器。问题是任务管理器无法连接到作业管理器并不断崩溃。

调试作业管理器日志时,jobmanager.rpc.address 绑定(bind)到“localhost”。

但是我已经按照 kube 文件传递​​了参数 this doc .

我还尝试在环境变量(FLINK_ENV_JAVA_OPTS)中设置jobmanager.rpc.address

  env:
- name: FLINK_ENV_JAVA_OPTS
value: "-Djobmanager.rpc.address=flink-job-cluster"

作业管理器控制台日志:

Starting the job-cluster
Starting standalonejob as a console application on host flink-job-cluster-bbxrn.
2019-07-16 17:31:10,759 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-07-16 17:31:10,760 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting StandaloneJobClusterEntryPoint (Version: <unknown>, Rev:4caec0d, Date:03.04.2019 @ 13:25:54 PDT)
2019-07-16 17:31:10,760 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - OS current user: flink
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM: OpenJDK 64-Bit Server VM - IcedTea - 1.8/25.212-b04
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Maximum heap size: 989 MiBytes
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JAVA_HOME: /usr/lib/jvm/java-1.8-openjdk/jre
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - No Hadoop Dependency available
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - JVM Options:
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2019-07-16 17:31:10,761 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:/opt/flink-1.8.1/conf/log4j-console.properties
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:/opt/flink-1.8.1/conf/logback-console.xml
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Program Arguments:
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - /opt/flink-1.8.1/conf
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --job-classname
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - wikiedits.WikipediaAnalysis
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --host
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - flink-job-cluster
2019-07-16 17:31:10,762 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Djobmanager.rpc.address=flink-job-cluster
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dparallelism.default=2
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dblob.server.port=6124
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dqueryable-state.server.ports=6125
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Classpath: /opt/flink-1.8.1/lib/log4j-1.2.17.jar:/opt/flink-1.8.1/lib/slf4j-log4j12-1.7.15.jar:/opt/flink-1.8.1/lib/wiki-edits-0.1.jar:/opt/flink-1.8.1/lib/flink-dist_2.11-1.8.1.jar:::
2019-07-16 17:31:10,763 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --------------------------------------------------------------------------------
2019-07-16 17:31:10,764 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-07-16 17:31:10,850 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-07-16 17:31:10,851 INFO org.apache.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1

以上日志显示 rpc.address 绑定(bind)到 localhost,而不是 flink-job-cluster

我假设任务管理器的消息被 Akka rpc 删除,因为它绑定(bind)到 localhost:6123。

2019-07-16 17:31:12,546 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - Request slot with profile ResourceProfile{cpuCores=-1.0, heapMemoryInMB=-1, directMemoryInMB=0, nativeMemoryInMB=0, networkMemoryInMB=0} for job 38190f2570cd5f0a0a47f65ddf7aae1f with allocation id 97af00eae7e3dfb31a79232077ea7ee6.
2019-07-16 17:31:14,043 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]
2019-07-16 17:31:26,564 ERROR akka.remote.EndpointWriter - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://flink@flink-job-cluster:6123/]] arriving at [akka.tcp://flink@flink-job-cluster:6123] inbound addresses are [akka.tcp://flink@localhost:6123]

不确定为什么作业管理器绑定(bind)到本地主机。

PS:任务管理器 pod 可以解析 flink-job-cluster 主机。主机名解析为服务 IP 地址。

最佳答案

问题的根本原因是 jobmanager.rpc.address arg 值未应用。不知何故in-line Args没有正确设置到 flink 全局配置中。但 args 传递为 multi-line list工作正常。

关于java - Apache Flink 作业集群 rpc.address 绑定(bind)到 kubernetes 上的本地主机,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57062965/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com