gpt4 book ai didi

apache-kafka - Kafka消费者设计以多实例处理海量数据

转载 作者:行者123 更新时间:2023-12-04 07:53:51 24 4
gpt4 key购买 nike

我正在尝试设计 Kafka 消费者,但我在如何设计流程方面遇到了障碍。我在考虑两种选择:

1.  Process records directly from Kafka.
2. Staging table write from Kafka and process records.
方法一:随时随地处理来自 Kafka 的关键消息:
•   Read messages one at a time from Kafka & if no records to process break the loop (configurable messages to process)
• Execute business rules.
• Apply changes to consumer database.
• Update Kafka offset to read after processing message.
• Insert into staging table (used for PD guide later on)
上述方法的问题:
•   Is it OK to subscribe to a partition and keep the lock open on Kafka partition until configurable messages are processed
and then apply business rules, apply changes to database. All happens in the same process, any performance issues doing this way ?
• Is it OK to manually commit the offset to Kafka? (Performance issues with manual offset commit).
方法二:从 Kafka 和进程记录写入暂存表
Process 1: Consuming events from Kafka and put in staging table.
Process 2: Reading staging table (configurable rows), execute business rules, apply consumer database changes
& update the status of processed records in staging table. (we may have multiple process to do this step)
我看到这种方法有很多缺点:
•   We are missing the advantage of offset handling provided by Kafka and we are doing manual update of processed records in staging table.
• Locking & Blocking on staging tables for multi instance, as we are trying to insert & do updates after processing in the same staging table
(note: I can design separate tables and move this data there and process them but that could is introducing multiple processes again.
如何设计具有多实例消费者和要处理的大量数据的 Kafka,哪种设计合适,从 Kafka 读取数据并处理消息或将其暂存到表中并编写另一个作业来处理这些消息是好的?

最佳答案

这就是我认为我们可以获得最佳吞吐量而不用担心消息丢失的方式-

  • 最大化分区数。
  • 部署消费者(最大分区数,如果您的消费者可以毫无问题地运行多线程,则更少。)
  • 从每个消费者中单线程读取(使用自动偏移提交)并将消息放入阻塞队列中,您可以根据每个消费者中的实际处理线程数进行控制。
  • 如果处理失败,您可以重试成功或将消息放入死信队列。不要忘记关闭连接的实现来处理已经消费的消息。
  • 如果您想确保对具有相同键的事件进行排序,一个接一个或一个分区中的任何其他因素,您可以使用确定性执行程序。我用 Java 编写了一个基本的 ExecutorService,它可以确定性地执行多条消息,而不会影响逻辑分离事件的多线程处理。友情链接- https://github.com/mukulbansal93/deterministic-threading

  • 回答您的问题-
  • Is it ok to subscribe to a partition and keep the lock open on Kafka partition until configurable messages are processed and then apply business rules, apply changes to database. All happens in the same process, any performance issues doing this way?我在这里没有看到太多性能问题,因为您正在批量处理。但是,您使用的消息之一可能需要很长时间,而其他消息则需要处理。在这种情况下,您将不会从 Kafka 读取导致性能瓶颈的其他消息。
  • Is it ok to manually commit the offset to Kafka? (Performance issues with manual offset commit).这绝对是吞吐量最低的方法,因为偏移提交是一项昂贵的操作。
  • 关于apache-kafka - Kafka消费者设计以多实例处理海量数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66805903/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com