gpt4 book ai didi

apache-spark - Spark Structured Streaming Kafka 错误——偏移量已更改

转载 作者:行者123 更新时间:2023-12-03 21:04:46 26 4
gpt4 key购买 nike

我的 Spark Structured Streaming 应用程序运行了几个小时,然后因此错误而失败

java.lang.IllegalStateException: Partition [partition-name] offset was changed from 361037 to 355053, some data may have been missed.
Some data may have been lost because they are not available in Kafka any more; either the
data was aged out by Kafka or the topic may have been deleted before all the data in the
topic was processed. If you don't want your streaming query to fail on such cases, set the
source option "failOnDataLoss" to "false".

当然,每次的偏移量都不同,但第一个总是大于第二个。主题数据不能过期,因为主题的保留期是 5 天,我昨天重新创建了这个主题,但今天又出现了错误。从此恢复的唯一方法是删除检查点。

Spark's Kafka integration guidefailOnDataLoss 下提及选项:

Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or offsets are out of range). This may be a false alarm. You can disable it when it doesn't work as you expected. Batch queries will always fail if it fails to read any data from the provided offsets due to lost data.



但是我找不到有关何时可以将其视为误报的更多信息,因此我不知道设置 failOnDataLoss 是否安全至 false或者如果我的集群存在实际问题(在这种情况下我们实际上会丢失数据)。

更新:我已经调查了 Kafka 日志,并且在 Spark 失败的所有情况下,Kafka 都记录了几条这样的消息(我假设每个 Spark 消费者一个):
INFO [GroupCoordinator 1]: Preparing to rebalance group spark-kafka-...-driver-0 with old generation 1 (__consumer_offsets-25) (kafka.coordinator.group.GroupCoordinator)

最佳答案

我不再有这个问题了。我做了两个改变:

  • 禁用 YARN 的动态资源分配(这意味着我必须手动计算执行器的最佳数量等并将它们传递给 spark-submit )
  • 升级到 Spark 2.4.0,这也将 Kafka 客户端从 0.10.0.1 升级到 2.0.0

  • 禁用动态资源分配意味着不会在应用程序运行时创建和终止执行程序(=消费者),从而无需重新平衡。所以这很可能是阻止错误发生的原因。

    关于apache-spark - Spark Structured Streaming Kafka 错误——偏移量已更改,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55055548/

    26 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com