2023年5月17日 14:36:30go评论111阅读模式

英文:

Can checkpoint be completed in which task is down in Flink?

问题

I am using the flink. I have five task manager and each task manager has one task slot each.

The source is Kafka and after reading data and processing some logics, the data is sinked to the dynamodb.

Note that I use the chaining all the operators so that 'source, process and sink' are combined together into a single operator.

And also, I use the checkpoint with exactly once semantics with 5 minute time interval.

In such scenario, the kafka has rolling updates and the coordinator has changed.

During the update, the flink has warning message showing that the offset is failed to source kafka.

From this scenario, I have following questions.

Is it possible for checkpoint to be completed successful? Because when I saw the flink dashboard, there are no failed checkpoint in the dashboard.
If so, can it be possible that task is down but the taskmanger is not down, so that it makes the checkpoint is completed? (task down is trivial issue so it does not affect the checkpoint's made)
When chaining together, the data loss can be possible even though we use the checkpoint? I mean that since the operators are combined together from source to sink, when the source is affected by external kafka, the sink will be affected because of the sink and chaining. At that time, the task is going to down and in-flight data is going to be discarded?

英文:

I am using the flink. I have five task manager and each task manager has one task slot each.

The source is Kafka and after reading data and processing some logics, the data is sinked to the dynamodb.

Note that I use the chaining all the operators so that source, process and sink are combined together into a single operator.

And also, I use the checkpoint with exactly once semantics with 5 minute time interval.

In such scenario, the kafka has rolling updates and the coordinator has changed.

During the update, the flink has warning message showing that the offset is failed to source kafka.

From this scenario, I have following questions.

Is it possible for checkpoint to be completed successful?
Because when I saw the flink dashboard, there are no failed checkpoint in the dashboard.
If so, can it be possible that task is down but the taskmanger is not down, so that it makes the checkpoint is completed? (task down is trivial issue so it does not affect the checkpoint's made)
When chaining together, the data loss can be possible even though we use the checkpoint? I mean that since the operators are combined together from source to sink, when the source is affected by external kafka, the sink will be affected because of the sink and chaining. At that time, the task is going to down and in-flight data is going to be discarded ?

答案1

得分: 1

是的。你看到的消息是当 Flink 使用的 Kafka 消费者尝试将偏移量保存到 Kafka 集群时出现的，但这与检查点无关。Flink 将偏移量保存到快照中，在从检查点或保存点恢复时将使用它们，因此它不需要保存到 Kafka 集群的偏移量。
是的，作业可以处于失败状态，但任务管理器仍在运行。然而，这与你第一个问题无关（这不是检查点可能完成的原因，即使你看到错误）。
当操作符从源到接收端进行链接时，即使我们使用了检查点，数据丢失也是可能的吗？我的意思是，由于操作符从外部 Kafka 接收数据，当源受到影响时，由于接收端和链接，接收端也会受到影响。在这种情况下，任务将失败，正在传输的数据将被丢弃？

在作业失败时，你总是会“丢失”正在传输的数据。但由于 Flink 将所有 Kafka 分区的偏移量保存为最后一个成功检查点的一部分，当作业从该检查点重新启动时，你将重新播放所有正在传输的数据。因此，不会发生数据丢失。

英文:

Is it possible for checkpoint to be completed successful? Because when I saw the flink dashboard, there are no failed checkpoint in the dashboard.

Yes. The msg you saw is when the Kafka consumer being used by Flink tries to save offsets to the Kafka cluster, but this is not used by checkpointing. Flink saves offsets to snapshots, and will use them when restoring from a checkpoint or savepoint, so it doesn't need offsets that are saved to the Kafka cluster.

If so, can it be possible that task is down but the taskmanger is not down, so that it makes the checkpoint is completed [snip]

I assume you're asking whether the job can be in a failed state, but the Task Manager is still running. Yes, but that has nothing to do with your first question (it's not why a checkpoint could be completed even though you see errors).

When chaining together, the data loss can be possible even though we use the checkpoint? I mean that since the operators are combined together from source to sink, when the source is affected by external kafka, the sink will be affected because of the sink and chaining. At that time, the task is going to down and in-flight data is going to be discarded?

You always "lose" in-flight data when a job fails. But because Flink saves the offsets for all Kafka partitions as part of the last successful checkpoint, when the job is restarted from that checkpoint, you will wind up replaying all of the in-flight data. So there is no data loss.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

任务在Flink中已停止时，是否可以完成检查点？

问题

答案1

如何在自定义时间触发 Flink 检查点？

构建一个Maven项目并使用Docker提交为Flink作业。

Flink是否具有Spark中的速率源？

使用 RocksDB 作为 Flink 中状态后端创建快照所进行的 API 调用是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论