gpt4 book ai didi

apache-flink - Apache Flink - 检查点和保存点之间的区别?

转载 作者:行者123 更新时间:2023-12-04 03:31:41 24 4
gpt4 key购买 nike

有人可以帮助我了解 Apache Flink 的检查点和保存点之间的区别。

当我阅读文档时,无法理解其中的区别! :s

最佳答案

Apache Flink 的 Checkpoints 和 Savepoints 的相似之处在于它们都是保存 Flink 应用程序内部状态的机制。

检查点是自动获取的,用于在发生故障时自动重新启 Action 业。

另一方面,保存点是手动获取的,始终存储在外部,用于在例如以下情况下启动具有先前内部状态的"new"作业。

  • 错误修复
  • flink 版本升级
  • A/B 测试等

  • 在它们下面实际上是相同的机制/代码路径,但有一些细微的差别。

    编辑:

    官方文档 https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#what-is-a-savepoint-how-is-a-savepoint-different-from-a-checkpoint也有很好的解释:

    A Savepoint is a consistent image of the execution state of a streaming job, created via Flink’s checkpointing mechanism. You can use Savepoints to stop-and-resume, fork, or update your Flink jobs. Savepoints consist of two parts: a directory with (typically large) binary files on stable storage (e.g. HDFS, S3, …) and a (relatively small) meta data file. The files on stable storage represent the net data of the job’s execution state image. The meta data file of a Savepoint contains (primarily) pointers to all files on stable storage that are part of the Savepoint, in form of absolute paths. Attention: In order to allow upgrades between programs and Flink versions, it is important to check out the following section about assigning IDs to your operators.

    Conceptually, Flink’s Savepoints are different from Checkpoints in a similar way that backups are different from recovery logs in traditional database systems. The primary purpose of Checkpoints is to provide a recovery mechanism in case of unexpected job failures. A Checkpoint’s lifecycle is managed by Flink, i.e. a Checkpoint is created, owned, and released by Flink - without user interaction. As a method of recovery and being periodically triggered, two main design goals for the Checkpoint implementation are i) being as lightweight to create and ii) being as fast to restore from as possible. Optimizations towards those goals can exploit certain properties, e.g. that the job code doesn’t change between the execution attempts. Checkpoints are usually dropped after the job was terminated by the user (except if explicitly configured as retained Checkpoints).

    In contrast to all this, Savepoints are created, owned, and deleted by the user. Their use-case is for planned, manual backup and resume. For example, this could be an update of your Flink version, changing your job graph, changing parallelism, forking a second job like for a red/blue deployment, and so on. Of course, Savepoints must survive job termination. Conceptually, Savepoints can be a bit more expensive to produce and restore and focus more on portability and support for the previously mentioned changes to the job.

    Those conceptual differences aside, the current implementations of Checkpoints and Savepoints are basically using the same code and produce the same format. However, there is currently one exception from this, and we might introduce more differences in the future. The exception are incremental checkpoints with the RocksDB state backend. They are using some RocksDB internal format instead of Flink’s native savepoint format. This makes them the first instance of a more lightweight checkpointing mechanism, compared to Savepoints.

    关于apache-flink - Apache Flink - 检查点和保存点之间的区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45603953/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com