2023年4月6日 21:15:17go评论96阅读模式

英文:

Partial ordering of events in distributed system in practice

问题

我们正在使用Symfony Messenger与Amazon SQS作为消息队列。我们不能保证事件将按照它们被派发的顺序进行处理。

示例

我们有一个Payment：

class Payment {
    public static function create(): void;
    public function setAsCharged(): void;
    public function setAsFailed(): void;
}

我们接收到一批发生在Payment上的操作，并将它们派发到队列中：

$actions = [
    ['初始化', '2023-01-01 00:00:00'],
    ['收费', '2023-01-01 00:00:10']
];
foreach ($actions as $action) {
    $messenger->dispatch(new PaymentEvent($action));
}

当前实现

事件按顺序派发到消息队列，但由于SQS的工作方式，charge事件可能在initialization事件之前被处理。
我们抛出异常，因为Payment尚不存在。然后通过重试机制重试消息。同时，在不同的工作程序上处理initialization事件。charge被重试，Payment已经存在。一切正常。Payment处于正确的状态。

问题

这不是推荐的行为。不应该使用异常来控制应用程序。此外，我们无法区分真正失败的事件和因为创建Payment花费太长时间而失败的事件。

观察到的解决方案但非常复杂

我研究了一些用于在分布式系统中实现部分排序的算法，例如Lamport时间戳，Paxos，Raft。

如果我理解它们正确，工作程序必须以某种方式进行通信。我们应该使用Redis，MySQL或任何其他可以被所有工作程序访问的持久性存储吗？

我们使用PHP，我没有找到任何可以使用的实现示例或库。这看起来是一个非常复杂的问题，不值得。特别是如果我们可以使用重试机制。

我们不想使用FIFO队列，因为性能较低。

我们正在尝试的另一个解决方案

我们目前正在尝试使用进程管理器来解决这个问题。通过它们，我们可以将事件保存到类似缓冲区的东西中，当接收到预期的消息并处理时，我们就会去处理缓冲区中的另一个事件。

这开始变得非常复杂，必须根据我们建模的每个域进行定制。

问题

你认为重试机制足够好吗？
有没有可以在一般情况下使用的不那么复杂的解决方案？
上述提到的算法中有适合我们技术栈的吗？

英文:

We are using Symfony Messenger with Amazon SQS as message queue. We have no guarantee that events will be processed in same order as they was dispatched.

Example

We have Payment:

class Payment {
    public static function create(): void;
    public function setAsCharged(): void;
    public function setAsFailed(): void;
}

We receive set of actions that happened on Payment in one batch and dispatch them to queue:

$actions = [
    [&#39;initialization&#39;, &#39;2023-01-01 00:00:00&#39;],
    [&#39;charge&#39;, &#39;2023-01-01 00:00:10&#39;]
];
foreach ($actions as $action) {
    $messenger-&gt;dispatch(new PaymentEvent($action)));
}

Current implementation

Events are dispatched to message queue in order, but as SQS works, charge event can be processed before initialization event.
We are throwing exception because Payment does not already exists. Message is then retried by retry mechanism. Meanwhile initialization event is processed on different worker. charge is retried, Payment already exists. Everything is good. Payment is in correct state.

Problem

This is not recommended behavior. Exception should not be used to control application.
Moreover we can not distinguish between events that really failed and events that failed because of Payment creating takes too long.

Observed solutions but very complex

I have studied some algorithms for achieving partial ordering in distributed systems such as: Lamport timestamp, Paxos, Raft

If I understand them correctly, workers must somehow communicate between each other. Should we use Redis, MySQL or any other persistent storage, that can be accessed by all of the workers?

We are using PHP and I didn't find any examples or libraries with implementation to use. It looks like it is very complex problem and it is not worth it. Especially if we can use retry mechanism.

We do not want to use FIFO queue because of lower performance.

Another solution we are trying

We are currently trying to use process managers to address that problem. With them we can save event to something like buffer and when expected message is received and processed we go to processing another event in buffer.

It starting to come very complex and must be tailored to every domain we modeling.

Question

Do you think retry mechanism is good enough?
Is there any less complex solution that can be used in general?
Is there any of algorithm mentioned above suitable for our technological stack?

答案1

得分: 1

两个想法：

如果你只需要部分（而非全部）排序，那很可能可以拥有所需数量的 FIFO 队列（直到可能事件的分区数量，使得不同分区内的两个事件没有排序要求），并根据该事件的排序要求将事件直接定向到相应的队列。
或者，你可能能够利用你的领域的某些方面来表示尚未看到的事件。例如，对于给定付款的“charge”事件，如果在“initialization”事件之前出现，可能会导致一种状态，基本上是“这笔付款尚未初始化，但已经被扣款”。

英文:

Two thoughts:

If you only need a partial (vs. a total) ordering, then it's likely you can have as many FIFO queues as needed (up to the number of partitions of possible events such that no two events in different partitions have an ordering requirement) and direct events to the appropriate queue based on the ordering requirements of that event.
Alternatively, you might be able to exploit some aspect of your domain to represent the not-yet-seen event. For instance, a charge event for a given payment if seen before the initialization event, can result in a state that's basically "this payment hasn't been initialized but has been charged".

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

分布式系统中事件的部分顺序实践

问题

答案1

不同应用之间的安全请求

为什么error_log文件应该是私有的

我使用PUT方法，但一直显示不允许。

为什么 PhpStorm 一直打开 Xdebug 会话？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。