gpt4 book ai didi

java - 如何在同一台机器上运行数百个 Kafka 消费者?

转载 作者:塔克拉玛干 更新时间:2023-11-02 19:43:54 27 4
gpt4 key购买 nike

在 Kafka 文档中,提到消费者不是线程安全的。为了避免这个问题,我读到为每个 Java 进程运行一个消费者是个好主意。如何实现?

消费者的数量没有定义,但可以根据需要改变。

谢谢,阿莱西奥

最佳答案

您是对的,文档指定 Kafka 消费者不是线程安全的。然而,它也说你应该在单独的线程上运行消费者,不是进程。那是完全不同的。请在此处查看针对 Java/JVM 的更具体的答案: https://stackoverflow.com/a/15795159/236528

通常,您可以在 Kafka 主题上拥有任意数量的消费者。其中一些可能共享一个group id,在这种情况下,该主题的所有分区将分布在任何时间点活跃的所有消费者。

关于 Kafka 消费者的 Javadoc 有更多详细信息,链接在此答案的底部,但我复制了下面文档建议的两个线程/消费者模型。

1. One Consumer Per Thread

A simple option is to give each thread its own consumer instance. Hereare the pros and cons of this approach:

PRO: It is the easiest to implement

PRO: It is often the fastest as no inter-thread co-ordination is needed

PRO: It makes in-order processing on a per-partition basis very easy to implement (each thread just processes messages in the order it receives them).

CON: More consumers means more TCP connections to the cluster (one per thread). In general Kafka handles connections very efficiently so this is generally a small cost.

CON: Multiple consumers means more requests being sent to the server and slightly less batching of data which can cause some drop in I/O throughput.

CON: The number of total threads across all processes will be limited by the total number of partitions.

2. Decouple Consumption and Processing

Another alternative is to have one or more consumer threads that doall data consumption and hands off ConsumerRecords instances to ablocking queue consumed by a pool of processor threads that actuallyhandle the record processing. This option likewise has pros and cons:

PRO: This option allows independently scaling the number of consumersand processors. This makes it possible to have a single consumer thatfeeds many processor threads, avoiding any limitation on partitions.

CON: Guaranteeing order across the processors requires particular careas the threads will execute independently an earlier chunk of data mayactually be processed after a later chunk of data just due to the luckof thread execution timing. For processing that has no orderingrequirements this is not a problem.

CON: Manually committing theposition becomes harder as it requires that all threads co-ordinate toensure that processing is complete for that partition. There are manypossible variations on this approach. For example each processorthread can have its own queue, and the consumer threads can hash intothese queues using the TopicPartition to ensure in-order consumptionand simplify commit.

根据我的经验,选项 #1 最适合入门,只有在您确实需要时才可以升级到选项 #2。选项 #2 是从 kafka 消费者中提取最大性能的唯一方法,但它的实现更复杂。因此,请先尝试选项 #1,看看它是否足以满足您的特定用例。

完整的 Javadoc 可在以下链接获得: https://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

关于java - 如何在同一台机器上运行数百个 Kafka 消费者?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56915044/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com