gpt4 book ai didi

apache-kafka - 什么会导致 Kafka 消费者出现 "Failed to get offsets by times"?

转载 作者:行者123 更新时间:2023-12-03 09:24:27 26 4
gpt4 key购买 nike

我有一个卡夫卡消费者。它似乎工作了一段时间,然后就死了。它反复这样做。我收到此异常,但没有其他信息。

org.apache.kafka.common.errors.TimeoutException:
Failed to get offsets by times in 305000 ms

305000 毫秒是 5 分钟。有什么可能导致这种情况的线索吗?或尝试找出的步骤?

如果相关:

我在不同的机器上有 3 个进程,使用最新的 Java Kafka 客户端版本 0.10.2.0。每台机器运行20个线程,每个线程都有一个独立的Consumer。按照设计,当一个线程死亡时,所有线程都会被杀死,进程也会死亡,然后重新启动。这导致大约 20 个消费者同时死亡和重新启动,这将导致重新平衡。因此,这可能会导致客户端之间的周期性干扰。然而,这并不能解释为什么我首先得到这个异常。

我有三台 Kafka 机器和三台 Zookeeper 机器。每个客户端在其 bootstrap.servers 中拥有全部 3 台 Kafka 机器配置。该主题有 200 个分区,这意味着每个线程分配了大约 3 个分区。该主题的复制因子为 2。

Kafka 或 Zookeeper 日志中没有错误。

设置了以下配置值,没有其他值。
  • bootstrap.servers
  • group.id
  • key.deserializer
  • value.deserializer
  • 最佳答案

    我今天遇到了这个。我看到此错误消息的两个不同版本,具体取决于我使用的是 Kafka 1.0 客户端库还是 Kafka 2.0 客户端库。错误消息是 "org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 305000 ms"适用于 Kafka 1.0 客户端和 "org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 30003ms"对于 2.0 客户端库。

    我在尝试使用 kafka-console-consumer 命令(例如 kafka-consumer-groups --bootstrap-server {servers} --group {group} --describe )命令监视偏移/滞后时收到此消息。这些命令是 kafka/confluent 工具的一部分,但我想这可能会发生在其他客户端上。

    问题似乎是我有一个复制因子为 1 的主题,该主题的分区没有指定的领导者 .我发现这一点的唯一方法是更新 {kafka_client_dir}\libexec\config\tools-log4j.properties要在调试级别记录的文件:log4j.rootLogger=DEBUG, stderr
    请注意,这是 kafka/confluent 工具的 log4j 配置文件 - 其他客户端的 YMMV。我正在从我的 Mac 运行它们。

    完成后,我在输出中看到以下消息,它提醒我 ISR/offlineReplicas 问题:

                 [2019-01-28 11:41:54,290] DEBUG Updated cluster metadata version 2 to Cluster(id = 0B1zi_bbQVyrfKwqiDa2kw, 
    nodes = [
    brokerServer3:9092 (id: 3 rack: null),
    brokerServer6:9092 (id: 6 rack: null),
    brokerServer1:9092 (id: 1 rack: null),
    brokerServer5:9092 (id: 5 rack: null),
    brokerServer4:9092 (id: 4 rack: null)], partitions = [

    Partition(topic = myTopicWithReplicatinFactorOne, partition = 10, leader = 6, replicas = [6], isr = [6], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 11, leader = 1, replicas = [1], isr = [1], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 12, leader = none, replicas = [2], isr = [], offlineReplicas = [2]),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 13, leader = 3, replicas = [3], isr = [3], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 14, leader = 4, replicas = [4], isr = [4], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 2, leader = 4, replicas = [4], isr = [4], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 3, leader = 5, replicas = [5], isr = [5], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 4, leader = 6, replicas = [6], isr = [6], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 5, leader = 1, replicas = [1], isr = [1], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 6, leader = none, replicas = [2], isr = [], offlineReplicas = [2]),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 7, leader = 3, replicas = [3], isr = [3], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 8, leader = 4, replicas = [4], isr = [4], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 9, leader = 5, replicas = [5], isr = [5], offlineReplicas = []),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 0, leader = none, replicas = [2], isr = [], offlineReplicas = [2]),
    Partition(topic = myTopicWithReplicatinFactorOne, partition = 1, leader = 3, replicas = [3], isr = [3], offlineReplicas = [])
    ], controller = brokerServer4:9092 (id: 4 rack: null)) (org.apache.kafka.clients.Metadata)

    你可以在上面看到 offlineReplicas = [2] - 暗示这个问题。还有 brokerServer2不在经纪人名单中。

    最终,我重新启动了受影响的代理 ( brokerServer2 ) 以使其恢复同步,一旦完成,我再次使用命令行工具就没有问题了。可能有比重启代理更好的方法来解决这个问题,但它最终解决了这个问题

    关于apache-kafka - 什么会导致 Kafka 消费者出现 "Failed to get offsets by times"?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44935145/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com