2020年9月30日 16:26:12go评论93阅读模式

英文:

Can Kafka be used as a distribute work queue

问题

我正在考虑使用Kafka作为分布式工作队列，多个工作者可以从中检索任务。我的原始设计如下：

工作生产者 ---&gt; Kafka主题 ------工作者1
                                  |
                                  |__工作者2
                                  ...
                                  |__工作者n

这个设计存在以下问题：

如果某个工作者从主题中获取任务并立即提交偏移量，那么在发生故障的情况下可能不会重新处理该任务。
如果某个工作者从主题中获取任务并仅在完成后提交偏移量，那么其他工作者也可能获取此任务并处理它。如果任务持续时间很长，那么几乎所有工作者都将获取相同的任务并完全处理它，从而抑制了分发的特性。

我正在寻找一种方法将队列中的任务“标记”为“正在处理”，以便其他人不能消费该任务，但不会提交偏移量（因为它可能会失败并需要重新处理）。这种实现是否可行？

英文:

I'm considering Kafka to use as a distributed work queue multiple workers can retrieve tasks from. My original design looks as:

Work Producer ---&gt; Kafka topic ------worker 1
                                  |
                                  |__worker 2
                                  ...
                                  |__worker n

The problems with this design is this:

If some worker takes a task from the topic and immediately commits offset then in case of failure the task may not be reprocessed.
If some worker takes a task from the topic and commits offset only on finish then other workers may also takes this task and process it. If the task is pretty long lasting then almost all workers will take the same task and process it completely inhibiting the distributing nature.

I'm looking for a way "mark" a task in a queue as "in progress" so it's not consumed by anyone else, but offset is not committed (because it may fail and needs reprocessing). Is it possible to implement?

答案1

得分: 3

> 如果某个工作人员从主题中获取任务并立即提交偏移量，那么如果出现故障，则可能不会重新处理该任务。

在这种情况下，我建议使用手动提交并禁用消费者的auto.commit.offset配置。

> 如果某个工作人员从主题中获取任务并仅在完成时提交偏移量，则其他工作人员也可能获取此任务并处理它。如果任务持续时间相当长，则几乎所有工作人员都将获取相同的任务并完全处理它，从而抑制了分发的特性。

您可以通过使用分区设计主题和使用ConsumerGroup设计消费者来处理这种情况。在Kafka中，每个分区只能由Consumer Group内的一个消费者线程读取。

这意味着只要您的所有消费者（或“工作人员”）属于同一个ConsumerGroup，绝对不会出现两个工作人员同时开始读取和处理相同的消息。

英文:

> If some worker takes a task from the topic and immediately commits offset then in case of failure the task may not be reprocessed.

In that case I recommend to use manual commits and disable the auto.commit.offset configuration of your consumer.

> If some worker takes a task from the topic and commits offset only on finish then other workers may also takes this task and process it. If the task is pretty long lasting then almost all workers will take the same task and process it completely inhibiting the distributing nature.

You could deal with this scenario by designing your topic with partitions and your consumers with a ConsumerGroup. In Kafka, every partition can only be read by one consumer thread within a Consumer Group.

That means, as long as all your consumers (or "workers") belong to the same ConsumerGroup it will never be the case that two workers will start reading and processing the same message.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以将Kafka用作分布式工作队列吗？

问题

答案1

在Java中查找大型数据数组中的特定元素？

在数组中插入一个值，显示包含这个值的数组。

Java method for checking if one item in list is in another list of items (sort of like a VLOOKUP in excel)?

“无法在Windows命令行中执行简单的Java文件”

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。