gpt4 book ai didi

apache-kafka - 当我退回代理时,Apache Kafka 会丢失一些消费者补偿

转载 作者:行者123 更新时间:2023-12-03 15:15:05 26 4
gpt4 key购买 nike

卡夫卡 1.1.1-cp1。 (编辑 4:我最终向 Kafka 提交了一个关于此的错误 - https://issues.apache.org/jira/browse/KAFKA-7447)

我有 3 个经纪人,与 min.insync.replicas = 2对于所有主题,以及 offsets.commit.required.acks = -1 .

当我停止其中一个代理时,正如您所期望的那样,它会移交其领导的分区,并且一切都照常进行(消费者消费,生产者生产)。

当我把经纪人带回来时,问题就开始了。似乎发生的是它在集群中引起困惑,并且一些 __consumer_offset 主题立即被截断为 0。

以下是从受影响的 __consumer_offset 分区(最初由发生故障的代理领导的分区)中按时间顺序选择的日志。这个故事在来自所有三个经纪人的日志中上演。

本质上,我反弹的broker回来了,貌似不能理解新leader是什么意思,截断为0,然后说服其他replica也截断为0。

prod-kafka-2:(刚刚启动)

[2018-09-17 09:21:46,246] WARN [ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Based on follower's leader epoch, leader replied with an unknown offset in __consumer_offsets-29. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)

prod-kafka-3:(见replica1回来)
[2018-09-17 09:22:02,027] INFO [Partition __consumer_offsets-29 broker=2] Expanding ISR from 0,2 to 0,2,1 (kafka.cluster.Partition)

prod-kafka-2:
[2018-09-17 09:22:33,892] INFO [GroupMetadataManager brokerId=1] Scheduling unloading of offsets and group metadata from __consumer_offsets-29 (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:22:33,902] INFO [GroupMetadataManager brokerId=1] Finished unloading __consumer_offsets-29. Removed 0 cached offsets and 0 cached groups. (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:24:03,287] INFO [ReplicaFetcherManager on broker 1] Removed fetcher for partitions __consumer_offsets-29 (kafka.server.ReplicaFetcherManager)
[2018-09-17 09:24:03,287] INFO [Partition __consumer_offsets-29 broker=1] __consumer_offsets-29 starts at Leader Epoch 78 from offset 0. Previous Leader Epoch was: 77 (kafka.cluster.Partition)
[2018-09-17 09:24:03,287] INFO [GroupMetadataManager brokerId=1] Scheduling loading of offsets and group metadata from __consumer_offsets-29 (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:24:03,288] INFO [GroupMetadataManager brokerId=1] Finished loading offsets and group metadata from __consumer_offsets-29 in 0 milliseconds. (kafka.coordinator.group.GroupMetadataManager)

prod-kafka-3:努力同意 prod-kafka-2。将其踢出 ISR,但随后与 ZooKeeper 发生冲突。也许 2 和 3 都认为他们是领导者?
[2018-09-17 09:24:15,372] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:15,377] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

prod-kafka-2:粗暴地将另外两个副本从 ISR 列表中踢出,即使 2 是我们刚刚重新启动的副本,因此很可能落后。 (请记住,它已经决定将主题截断为 0!)
[2018-09-17 09:24:16,481] INFO [Partition __consumer_offsets-29 broker=1] Shrinking ISR from 0,2,1 to 1 (kafka.cluster.Partition)

prod-kafka-3:仍在与动物园管理员战斗。最后输了。
[2018-09-17 09:24:20,374] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:20,378] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:25,347] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:25,350] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:30,359] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:30,362] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:35,365] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:35,368] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:40,352] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:40,354] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:45,422] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:45,425] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:50,345] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:50,348] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:24:55,444] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:24:55,449] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:00,340] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:00,343] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:05,374] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:05,377] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:10,342] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:10,344] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:15,348] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:15,351] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:20,338] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:20,340] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:25,338] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:25,340] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:30,382] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:30,387] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:35,341] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:35,344] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:40,460] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:40,465] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
[2018-09-17 09:25:45,335] INFO [Partition __consumer_offsets-29 broker=2] Shrinking ISR from 0,2,1 to 0,2 (kafka.cluster.Partition)
[2018-09-17 09:25:45,338] INFO [Partition __consumer_offsets-29 broker=2] Cached zkVersion [1582] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

prod-kafka-1:突然变得困惑并重新初始化为 0,因为 prod-kafka-2 显然成为领导者。
[2018-09-17 09:25:48,807] INFO [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Remote broker is not the leader for partition __consumer_offsets-29, which could indicate that the partition is being moved (kafka.server.ReplicaFetcherThread)

prod-kafka-3:最终决定 prod-kafka-2 负责,相应地截断
[2018-09-17 09:25:48,806] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions __consumer_offsets-29 (kafka.server.ReplicaFetcherManager)
[2018-09-17 09:25:48,807] INFO [ReplicaFetcherManager on broker 2] Added fetcher for partitions List([__consumer_offsets-29, initOffset 0 to broker BrokerEndPoint(1,prod-kafka-2.c.i-lastfm-prod.internal,9092)] ) (kafka.server.ReplicaFetcherManager)
[2018-09-17 09:25:48,809] INFO [GroupMetadataManager brokerId=2] Scheduling unloading of offsets and group metadata from __consumer_offsets-29 (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:25:48,810] INFO [GroupMetadataManager brokerId=2] Finished unloading __consumer_offsets-29. Removed 0 cached offsets and 0 cached groups. (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:25:48,950] WARN [ReplicaFetcher replicaId=2, leaderId=1, fetcherId=0] Based on follower's leader epoch, leader replied with an unknown offset in __consumer_offsets-29. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2018-09-17 09:25:48,951] INFO [Log partition=__consumer_offsets-29, dir=/var/lib/kafka/data] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)

prod-kafka-1:领导层就职典礼确认。
[2018-09-17 09:25:50,207] INFO [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Remote broker is not the leader for partition __consumer_offsets-29, which could indicate that the partition is being moved (kafka.server.ReplicaFetcherThread)

prod-kafka-2:现在它已经通过zookeeper 宣称其主导地位,prod-kafka-3 添加到 ISR 列表中
[2018-09-17 09:25:50,210] INFO [Partition __consumer_offsets-29 broker=1] Expanding ISR from 1 to 1,2 (kafka.cluster.Partition)

prod-kafka-1:仍在努力接受现实,但最终也被截断为 0。
[2018-09-17 09:25:51,430] INFO [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Remote broker is not the leader for partition __consumer_offsets-29, which could indicate that the partition is being moved (kafka.server.ReplicaFetcherThread)
[2018-09-17 09:25:52,615] INFO [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Remote broker is not the leader for partition __consumer_offsets-29, which could indicate that the partition is being moved (kafka.server.ReplicaFetcherThread)
[2018-09-17 09:25:53,637] INFO [ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Remote broker is not the leader for partition __consumer_offsets-29, which could indicate that the partition is being moved (kafka.server.ReplicaFetcherThread)
[2018-09-17 09:25:54,150] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions __consumer_offsets-29 (kafka.server.ReplicaFetcherManager)
[2018-09-17 09:25:54,151] INFO [ReplicaFetcherManager on broker 0] Added fetcher for partitions List([__consumer_offsets-29, initOffset 0 to broker BrokerEndPoint(1,prod-kafka-2.c.i-lastfm-prod.internal,9092)] ) (kafka.server.ReplicaFetcherManager)
[2018-09-17 09:25:54,151] INFO [GroupMetadataManager brokerId=0] Scheduling unloading of offsets and group metadata from __consumer_offsets-29 (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:25:54,153] INFO [GroupMetadataManager brokerId=0] Finished unloading __consumer_offsets-29. Removed 0 cached offsets and 0 cached groups. (kafka.coordinator.group.GroupMetadataManager)
[2018-09-17 09:25:54,261] WARN [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Based on follower's leader epoch, leader replied with an unknown offset in __consumer_offsets-29. The initial fetch offset 0 will be used for truncation. (kafka.server.ReplicaFetcherThread)
[2018-09-17 09:25:54,261] INFO [Log partition=__consumer_offsets-29, dir=/var/lib/kafka/data] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)

prod-kafka-2:完成了消费者偏移的妙招,现在全部为 0。
[2018-09-17 09:25:56,244] INFO [Partition __consumer_offsets-29 broker=1] Expanding ISR from 1,2 to 1,2,0 (kafka.cluster.Partition)

编辑:

根据要求,这里是 kafka server.properties 文件:
broker.id=1
default.replication.factor=3
auto.create.topics.enable=false
min.insync.replicas=2
num.network.threads=12
num.io.threads=16
num.replica.fetchers=6
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/lib/kafka/data
num.partitions=1
num.recovery.threads.per.data.dir=4
offsets.retention.minutes=10080
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.flush.interval.messages=20000
log.flush.interval.ms=10000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=60000
zookeeper.connect=prod-kafka-1:2181,prod-kafka-2:2181,prod-kafka-3:2181
zookeeper.connection.timeout.ms=6000
confluent.support.metrics.enable=false
confluent.support.customer.id=anonymous
group.initial.rebalance.delay.ms=3000

这是 zookeeper.properties 文件:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/var/lib/zookeeper
clientPort=2181
server.1=prod-kafka-1:2888:3888
server.2=prod-kafka-2:2888:3888
server.3=prod-kafka-3:2888:3888
autopurge.purgeInterval=12
autopurge.snapRetainCount=6

编辑 2
升级到 Kafka-2.0.0 似乎没有解决问题。

可能是我的传入速率太高,当我知道我崩溃的服务器即将恢复时,我需要限制生产者?听起来对吗……?

编辑 3
设置 auto.leader.rebalance.enable=false解决了问题,但现在我必须手动重新平衡。然而,当所有分区都被 catch 时手动重新平衡似乎不会造成任何问题。

最佳答案

自从提出这个问题以来,已经取得了一些进展。正如评论中指出的 jira Kafka-7447是为了跟踪这个问题而创建的。虽然那个问题还没有具体解决,但是有几个人提到这个问题与Kafka-8896有关。现在已经解决了。 (那些遇到原始问题的人,在使用包含此改进的版本后不再有问题。)
因此,问题已解决,使用 Kafka 2.2.2 及更高版本(或任何带有该补丁的版本)应确保您不会遇到此问题。

关于apache-kafka - 当我退回代理时,Apache Kafka 会丢失一些消费者补偿,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52367825/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com