英文:
What does externalized checkpoints mean
问题
在Flink的检查点中有一个概念,称为“外部化检查点”。这里的“外部化”是什么意思?是否有一个对应的概念,也许叫做“内部检查点”?
即使我不调用enableExternalizedCheckpoints
方法,但当我在HDFS上指定检查点路径时,我认为我是在外部持久化检查点,我可以说我在执行“外部化检查点”吗?
所以,我有点困惑。
/**
* 启用外部化检查点以持久化检查点数据。
*
* <p>外部化检查点会将其元数据写入持久存储,当拥有它的作业失败或被挂起(以作业状态{@link JobStatus#FAILED}或{@link JobStatus#SUSPENDED}终止)时,它们不会自动清理。在这种情况下,您必须手动清理检查点状态,包括元数据和实际程序状态。
*
* <p> {@link ExternalizedCheckpointCleanup} 模式定义了在作业取消时如何清理外部化检查点。如果选择在取消作业时保留外部化检查点,那么在取消作业时您也必须手动处理检查点清理(以作业状态{@link JobStatus#CANCELED}终止)。
*
* <p>外部化检查点的目标目录是通过 {@link org.apache.flink.configuration.CheckpointingOptions#CHECKPOINTS_DIRECTORY} 配置的。
*
* @param cleanupMode 外部化检查点清理行为。
*/
@PublicEvolving
public void enableExternalizedCheckpoints(ExternalizedCheckpointCleanup cleanupMode) {
this.externalizedCheckpointCleanup = checkNotNull(cleanupMode);
}
英文:
There is a concept in flink checkpoint, externalized checkpoints
. What does externalized
mean here? Is there a correspongding concept,maybe called internal checkpoints
?
Even I don't call enableExternalizedCheckpoints
method, but when I specify a checkpoint path on hdfs, I think I am persisting the checkpoints externally, can I say that I am doing externalized checkpoints
?
So, I am a little confused here.
/**
* Enables checkpoints to be persisted externally.
*
* <p>Externalized checkpoints write their meta data out to persistent
* storage and are <strong>not</strong> automatically cleaned up when
* the owning job fails or is suspended (terminating with job status
* {@link JobStatus#FAILED} or {@link JobStatus#SUSPENDED}). In this
* case, you have to manually clean up the checkpoint state, both
* the meta data and actual program state.
*
* <p>The {@link ExternalizedCheckpointCleanup} mode defines how an
* externalized checkpoint should be cleaned up on job cancellation. If you
* choose to retain externalized checkpoints on cancellation you have you
* handle checkpoint clean up manually when you cancel the job as well
* (terminating with job status {@link JobStatus#CANCELED}).
*
* <p>The target directory for externalized checkpoints is configured
* via {@link org.apache.flink.configuration.CheckpointingOptions#CHECKPOINTS_DIRECTORY}.
*
* @param cleanupMode Externalized checkpoint cleanup behaviour.
*/
@PublicEvolving
public void enableExternalizedCheckpoints(ExternalizedCheckpointCleanup cleanupMode) {
this.externalizedCheckpointCleanup = checkNotNull(cleanupMode);
}
答案1
得分: 1
The fact that checkpoints are stored in hdfs, doesn't exactly make them externalized straight away. The externalized checkpoints are externalized in a sense of particular job instance. The standard checkpoints are only used to recover from failure and if the job is cancelled or fails they are automatically cleaned up and they have no metadata, which means that they are not meant to be used apart by this particular job instance.
Now, external checkpoints keep metadata with the checkpoints and they are not removed automatically (you can configure this behavior to some extent). So, You can treat externalized checkpoint like a savepoint in a sense that You can use it to start another job instance after update, failure, or cancellation.
英文:
The fact that checkpoints are stored in hdfs, doesn't exactly make them externalized straight away. The externalized checkpoints are externalized in a sense of particular job instance. The standard checkpoints are only used to recover from failure and it the job is cancelled or fails they are automatically cleaned up and they have no metadata, which means that they are not meant to be used apart by this particular job instance.
Now, external checkpoints keep metadata with the checkpoints and they are not removed automatically (you can configure this behaviour to some extent). So, You can treat externalized checkpoint like a savepoint in a sense that You can use it to start another job instance after update, failure or cancellation.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论