gpt4 book ai didi

java - Kafka Streams 应用程序无休止的再平衡

转载 作者:行者123 更新时间:2023-12-01 14:21:03 24 4
gpt4 key购买 nike

我们正在运行一个 kafka 流应用程序,但遇到了一个奇怪的问题。我们同时使用全局状态存储和多个其他状态存储。

我们的应用程序已经加载了所有数据,状态存储中现在有大量的信息。现在,当我们尝试关闭应用程序并再次将其恢复(一些配置更改)时,它会进入无休止的重新平衡......为了验证我们恢复了配置更改,但它仍然停留在那个阶段。没有错误等

INFO  o.apache.kafka.streams.KafkaStreams - stream-client [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb] Started Streams client
INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] State transition from RUNNING to PARTITIONS_REVOKED
INFO o.apache.kafka.streams.KafkaStreams - stream-client [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb] State transition from RUNNING to REBALANCING
INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] partition revocation took 1 ms.
suspended active tasks: []
suspended standby tasks: []
INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] State transition from RUNNING to PARTITIONS_REVOKED
INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] partition revocation took 0 ms.
suspended active tasks: []
suspended standby tasks: []
04:02:13.682 6985 [main] INFO com..... - Started Application in 6.647 seconds (JVM running for 7.484)
04:02:23.300 16603 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
04:02:23.300 16603 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] State transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED
04:02:23.328 16631 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1] partition assignment took 28 ms.
current active tasks: [0_0, 1_0, 2_0, 3_0, 4_0, 5_0, 6_0, 7_5, 8_5, 9_5, 10_5, 12_4, 13_4, 14_4, 15_4, 16_4, 17_4, 19_3, 20_3, 21_3, 22_3, 23_3, 24_3, 25_3, 29_0]
current standby tasks: [0_2]
previous active tasks: []

04:02:23.328 16631 [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] INFO o.a.k.s.p.internals.StreamThread - stream-thread [app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2] partition assignment took 28 ms.
current active tasks: [0_3, 1_3, 2_3, 3_3, 4_3, 5_3, 7_2, 8_2, 9_2, 10_2, 12_1, 13_1, 14_1, 15_1, 16_1, 17_1, 19_0, 20_0, 21_0, 22_0, 23_0, 24_0, 25_0, 26_0]
current standby tasks: [0_5]
previous active tasks: []
04:03:47.602 100905 [http-nio-8080-exec-10] INFO c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:49.356 102659 [http-nio-8080-exec-2] INFO c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:51.600 104903 [http-nio-8080-exec-3] INFO c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING
04:03:53.356 106659 [http-nio-8080-exec-4] INFO c.j.d.r.b.p.base.StreamsRestService - State of Kafka Streams Application: REBALANCING

Number of topics - 100
Partitions per topic - 6. (7 topics with 1 partition only)
kubernetes env - 3 pods ( 2 stream threads )

当我们尝试使用以下命令列出消费者组时
root@bastion-0:/app/confluent-5.2.2/bin# ./kafka-consumer-groups --describe --group app  --bootstrap-server kafka-0..local:9094 --command-config /app/client-sasl-ssl.properties --members

CONSUMER-ID HOST CLIENT-ID #PARTITIONS
app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-1-consumer-3b370697-e737-411c-af28-fb04cfbae1dd 1.1.1.1/1.1.1.1 app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-1-consumer 45
app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-2-consumer-3edb3e5f-9f1a-499f-8732-6cd2c8b96c96 2.2.2.2/2.2.2.2 app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-2-consumer 45
app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1-consumer-00e24df4-5669-4e2c-a775-8f6c4f689714 3.3.3.3/3.3.3.3 app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-1-consumer 46
app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-2-consumer-1b6b2955-5dfd-4be7-8ad9-9f1b54fe6310 1.1.1.1/1.1.1.1 app-b8c729c9-dc1c-457b-8120-457035e84e58-StreamThread-2-consumer 45
app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-1-consumer-72cd0319-8ca7-493c-891d-3022b235ea01 2.2.2.2/2.2.2.2 app-aaef2f83-d51c-4b6f-bbd8-616db988bd33-StreamThread-1-consumer 45
app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2-consumer-c1a16d64-8d49-4758-ab64-2af3cd9aef0f 3.3.3.3/3.3.3.3 app-1f6b14fc-685c-49fb-83c0-54e15bca15cb-StreamThread-2-consumer 45

上述命令的输出不断变化 - 从 0 到某个可变数字。理想情况下,它应该在一段时间后变得稳定。

是否有用于 kafka 流平衡(重新平衡)的任何可调参数/配置

问题:
  • 是什么导致应用程序在启动时无休止地重新平衡(即使没有错误/异常等)。
  • 是否有任何可以帮助我们避免重新平衡的可调参数?
  • 最佳答案

    查看您添加的日志,消费者 pod 正在启动,所以我猜可能其他 2 个 pod 会滚动重启,因此每次停止和启动时都会重新平衡。

    尽管 Kafka 在运行 rebalance 时速度并不快,因为在此过程中跨组聊天 - 尽管分区可能分配给一个消费者,但该组仅在所有消费者都分配完后才开始消费,并且只有分配的发现才会发生在 poll 方法中(见 https://chrisg23.blogspot.com/2020/02/why-is-pausing-kafka-consumer-so.html)。

    因此,加快进程的方法是更频繁地轮询,以便您更快地了解更改,但有一个权衡 - 如果在正常运行时主题不忙,那么将有很多无所事事的旋转。

    但是,您对无休止的含义并不十分清楚。如果您的意思是应用程序实际上只是重新平衡,那么请参阅我上面的评论。可能是 pod 不断上升和下降(心跳停止)或者轮询需要很长时间 - 您是否为每条记录进行了大量 I/O?从日志和 pod 名称可以明显看出重启。过度轮询也会导致警告消息,建议您增加 max.poll.interval.ms或减少 max.poll.records

    关于java - Kafka Streams 应用程序无休止的再平衡,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61185849/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com