分布式系统中事件的部分顺序实践

huangapple go评论73阅读模式
英文:

Partial ordering of events in distributed system in practice

问题

我们正在使用Symfony Messenger与Amazon SQS作为消息队列。我们不能保证事件将按照它们被派发的顺序进行处理。

示例

我们有一个Payment

class Payment {
    public static function create(): void;
    public function setAsCharged(): void;
    public function setAsFailed(): void;
}

我们接收到一批发生在Payment上的操作,并将它们派发到队列中:

$actions = [
    ['初始化', '2023-01-01 00:00:00'],
    ['收费', '2023-01-01 00:00:10']
];
foreach ($actions as $action) {
    $messenger->dispatch(new PaymentEvent($action));
}

当前实现

事件按顺序派发到消息队列,但由于SQS的工作方式,charge事件可能在initialization事件之前被处理。
我们抛出异常,因为Payment尚不存在。然后通过重试机制重试消息。同时,在不同的工作程序上处理initialization事件。charge被重试,Payment已经存在。一切正常。Payment处于正确的状态。

问题

这不是推荐的行为。不应该使用异常来控制应用程序。此外,我们无法区分真正失败的事件和因为创建Payment花费太长时间而失败的事件。

观察到的解决方案但非常复杂

我研究了一些用于在分布式系统中实现部分排序的算法,例如Lamport时间戳,Paxos,Raft。

如果我理解它们正确,工作程序必须以某种方式进行通信。我们应该使用Redis,MySQL或任何其他可以被所有工作程序访问的持久性存储吗?

我们使用PHP,我没有找到任何可以使用的实现示例或库。这看起来是一个非常复杂的问题,不值得。特别是如果我们可以使用重试机制。

我们不想使用FIFO队列,因为性能较低。

我们正在尝试的另一个解决方案

我们目前正在尝试使用进程管理器来解决这个问题。通过它们,我们可以将事件保存到类似缓冲区的东西中,当接收到预期的消息并处理时,我们就会去处理缓冲区中的另一个事件。

这开始变得非常复杂,必须根据我们建模的每个域进行定制。

问题

  1. 你认为重试机制足够好吗?
  2. 有没有可以在一般情况下使用的不那么复杂的解决方案?
  3. 上述提到的算法中有适合我们技术栈的吗?
英文:

We are using Symfony Messenger with Amazon SQS as message queue. We have no guarantee that events will be processed in same order as they was dispatched.

Example

We have Payment:

class Payment {
    public static function create(): void;
    public function setAsCharged(): void;
    public function setAsFailed(): void;
}

We receive set of actions that happened on Payment in one batch and dispatch them to queue:

$actions = [
    ['initialization', '2023-01-01 00:00:00'],
    ['charge', '2023-01-01 00:00:10']
];
foreach ($actions as $action) {
    $messenger->dispatch(new PaymentEvent($action)));
}

Current implementation

Events are dispatched to message queue in order, but as SQS works, charge event can be processed before initialization event.
We are throwing exception because Payment does not already exists. Message is then retried by retry mechanism. Meanwhile initialization event is processed on different worker. charge is retried, Payment already exists. Everything is good. Payment is in correct state.

Problem

This is not recommended behavior. Exception should not be used to control application.
Moreover we can not distinguish between events that really failed and events that failed because of Payment creating takes too long.

Observed solutions but very complex

I have studied some algorithms for achieving partial ordering in distributed systems such as: Lamport timestamp, Paxos, Raft

If I understand them correctly, workers must somehow communicate between each other. Should we use Redis, MySQL or any other persistent storage, that can be accessed by all of the workers?

We are using PHP and I didn't find any examples or libraries with implementation to use. It looks like it is very complex problem and it is not worth it. Especially if we can use retry mechanism.

We do not want to use FIFO queue because of lower performance.

Another solution we are trying

We are currently trying to use process managers to address that problem. With them we can save event to something like buffer and when expected message is received and processed we go to processing another event in buffer.

It starting to come very complex and must be tailored to every domain we modeling.

Question

  1. Do you think retry mechanism is good enough?
  2. Is there any less complex solution that can be used in general?
  3. Is there any of algorithm mentioned above suitable for our technological stack?

答案1

得分: 1

两个想法:

  • 如果你只需要部分(而非全部)排序,那很可能可以拥有所需数量的 FIFO 队列(直到可能事件的分区数量,使得不同分区内的两个事件没有排序要求),并根据该事件的排序要求将事件直接定向到相应的队列。
  • 或者,你可能能够利用你的领域的某些方面来表示尚未看到的事件。例如,对于给定付款的“charge”事件,如果在“initialization”事件之前出现,可能会导致一种状态,基本上是“这笔付款尚未初始化,但已经被扣款”。
英文:

Two thoughts:

  • If you only need a partial (vs. a total) ordering, then it's likely you can have as many FIFO queues as needed (up to the number of partitions of possible events such that no two events in different partitions have an ordering requirement) and direct events to the appropriate queue based on the ordering requirements of that event.
  • Alternatively, you might be able to exploit some aspect of your domain to represent the not-yet-seen event. For instance, a charge event for a given payment if seen before the initialization event, can result in a state that's basically "this payment hasn't been initialized but has been charged".

huangapple
  • 本文由 发表于 2023年4月6日 21:15:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75949970.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定