事件溯源和CQRS:使用多个“读模型消费者”实例处理并发。

huangapple go评论40阅读模式
英文:

Event Sourcing and CQRS: Handle concurrency with multiple "read model consumers" instances

问题

我正在基于事件溯源(Event Sourcing)CQRS模式实现解决方案。

在我的使用案例中,我有:

  • 一个写入微服务:事件附加到存储在EventStoreDB实例上的特定流(我们称之为X流)。
  • 一个读取微服务:订阅EventStoreDB的X流,消费附加的事件,并将投影模型存储在MongoDB实例中。

在一个简单的情景中,当只有一个读取微服务的实例时,一切都如预期那样运行:

  1. 事件附加到EventStoreDB中的X流。
  2. 读取微服务的单个实例消费事件,并将投影模型存储在MongoDB实例中。

现在假设你想要扩展读取微服务(事件的消费者)到两个或更多实例。会发生以下情况:

  1. 事件附加到EventStoreDB中的X流。
  2. 读取微服务的每个副本都消费事件,并尝试将投影模型存储在MongoDB实例上,从而破坏了读取模型(因为存在并发写入)。

有没有办法处理这种情况?

英文:

I'm implementing a solution based on Event Sourcing and CQRS patterns.

In my use case I have:

  • A WRITE microservice: Where events are appended to a particular Stream (let's call it X Stream) stored on an EventStoreDB instance.
  • A READ microservice: Subscribed to the X Stream of the EventStoreDB, which consumes the appended events and store the projected model in a MongoDB instance.

In a simple scenario, where there is a single instance of the READ microservice, all works as expected:

  1. An event is appended to the X Stream in the EventStoreDB
  2. The single instance of the READ microservice consumes the event and stores the projected model on the MongoDB instance

Now suppose that you want to scale out the READ microservice (the event's consumer) to two or more instances. This is what will happen:

  1. An event is appended to the X Stream in the EventStoreDB
  2. Each replica of the READ microservice consumes the event and tries to store the projected model on the MongoDB instances, corrupting the READ model (because of the concurrent write).

Is there a way to handle this scenario?

答案1

得分: 2

通常情况下,有一个活动进程,使用catch-up订阅来更新您的读模型。

如果第一个进程意外停止,最终会有第二个进程处于待机状态。

当速度变得太慢时,您可以有多个进程,并将它们分区,以便它们处理相同目标存储上的特定文档集。

英文:

Usually there is 1 active process with the catch-up subscription doing update to your read model.

And eventually a second one on stand-by if that first process should stop unexpectedly

When that becomes too slow , you can have multiple processes and partition them in such a way that they would handle a specific set of documents on the same target store.

答案2

得分: 0

Yves写的答案是正确的,我只想补充一些事情。

我可以补充一下,写入侧和读取侧的扩展模型完全不同。

  • 写入侧可以在不多的限制下无缝扩展,除了数据库可以容忍多少读写操作(取决于实例大小)。
  • 读取侧受到目标数据库性能的限制。显然,在单个订阅实例中线性运行一个投影器将受到在一定时间内可以向目标数据库执行多少往返操作的物理限制的影响(比如一秒钟)。
  • 读取侧的可扩展性还取决于排序要求。如果您需要对整个日志或某个类别的事件进行排序,那就是一回事。如果您只关心单个流中的事件按顺序投影,那就不同了。排序要求会让您了解如何对读取模型更新进行分区。

我通过按流进行分区使投影速度提高了很多,但仍然是一个单一进程。在许多情况下,这是可以接受的,因为它可以每分钟投影数千个事件。高可用性关注是合理的,目的是在发生故障时增加冗余,但再次,通过应用简单的健康检查作为预防措施,可以确保订阅工作负载在卡住时重新启动。

我们正在研究的某些东西可能会完全解决整个问题,但我不敢提及任何日期,因为我们仍在研究这个主题。

英文:

Yves wrote the correct answer, I just want to add a couple of things.

I could add that write and read side scaling models are completely different.

  • The write side scales seamlessly without many constraints except how many read-write operations the database can tolerate (depends on the instance size).
  • The read side is constrained by performance of the target database. Obviously, running a projector linearly in a single subscription instance will hit the physical limitation of how many round trips to the target database you can do in a given period of time (say, one sec).
  • The read side scalability also depends on the ordering requirement. If you need events to be ordered in the whole log, or a category - it's one thing. If you only care about events from a single stream being projected in order, it's different. The ordering requirement gives you the idea about how you can partition the read model updates.

I made projections a lot faster by applying partitioning by stream, but it still is a single process. In many cases it is fine, as it can project thousands events per minute. The high availability concern is legit for the purpose of increased redundancy in case of failures, but then again, applying simple health checks as prevention measures will ensure that the subscription workload gets restarted if it is stuck.

We are baking something that might remove the whole issue all together, but I won't dare mentioning any dates as we are still researching the topic.

答案3

得分: 0

"现在假设您想要扩展READ微服务(事件的消费者)到两个或更多实例。将会发生以下情况:

  1. 事件附加到EventStoreDB中的X Stream。
  2. READ微服务的每个副本都会消费事件,并尝试将投影模型存储在MongoDB实例上,由于并发写入而破坏READ模型。

是否有处理这种情况的方法?"

是的,请运行四个不同的读模型,而不是一个。

我知道这听起来非常明显,但许多人忽略了它。请运行四个不同的MongoDB实例,而不是一个集群,然后将数据分别放入其中。

英文:

"Now suppose that you want to scale out the READ microservice (the event's consumer) to two or more instances. This is what will happen:

An event is appended to the X Stream in the EventStoreDB
Each replica of the READ microservice consumes the event and tries to store the projected model on the MongoDB instances, corrupting the READ model (because of the concurrent write).
Is there a way to handle this scenario?"

Yes run with four distinct read models not one.

I know that sounds blatantly obvious but many miss it. Run four distinct instances of mongodb not one cluster and four sets putting into it.

答案4

得分: 0

仍在回想这个用例,我在想是否实现幂等监听器可能是一个优雅的想法:

*例如,可以使用事件ID作为关联ID在读模型中确保监听器的幂等性

你对这个方法有什么看法?

事件溯源和CQRS:使用多个“读模型消费者”实例处理并发。

英文:

Still thinking back about this use case, I'm wondering if implementing idempotent listeners could be an elegant idea:

*For example, one might use the Event ID as a correlation-id in the Read Model to ensure the idempotency of a listener.

What do you think about this approach?

事件溯源和CQRS:使用多个“读模型消费者”实例处理并发。

huangapple
  • 本文由 发表于 2023年5月30日 05:34:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360436.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定