宇宙变更 feed 处理器和单个主机内的并发性

huangapple go评论79阅读模式
英文:

Cosmos change feed processor and concurrency within a single host

问题

  1. 租赁文档存在于每个物理分区还是逻辑分区?
  2. 如果我只有一个主机,但有多个逻辑分区和一个单一的物理分区,那么更改Feed处理器会为每个逻辑分区利用多个线程吗?我已经像这样实现了Cosmos更改Feed处理器库。
var changeFeedProcessorInstance = new ChangeFeedProcessorBuilder()
    .options(cfOptions)
    .hostName(hostName)
    .feedContainer(container)
    .leaseContainer(leaseContainer)
    .handleChanges((List<JsonNode> docs) -> {
        for (JsonNode document : docs) {
            // 进行一些处理
        }
    })
    .buildChangeFeedProcessor();
changeFeedProcessorInstance.start()
    .subscribeOn(Schedulers.elastic())
    .doOnSuccess(aVoid -> {
    })
    .subscribe();

我的假设是由于Scheduler.elastic,可以基于每个逻辑分区使用新线程。

英文:

After going through Cosmos db documentation I am a bit confused about the change feed processor library.

  1. Lease document exists for each physical partition or logical partition?
  2. If I only have one host with multiple logical partition and a single physical partition. Will the change feed processor utilize multiple threads for each logical partition. I have cosmos change feed processor library implemented like this.
        var changeFeedProcessorInstance = new ChangeFeedProcessorBuilder()
          .options(cfOptions)
          .hostName(hostName)
          .feedContainer(container)
          .leaseContainer(leaseContainer)
          .handleChanges((List&lt;JsonNode&gt; docs) -&gt; {
          for (JsonNode document : docs) {
            // Doing some processing
            }
          })
        .buildChangeFeedProcessor();
        changeFeedProcessorInstance.start()
          .subscribeOn(Schedulers.elastic())
          .doOnSuccess(aVoid -&gt; {
           })
          .subscribe();

My assumption is due to Scheduler.elastic a new thread can be utilized per logical partition basis.

答案1

得分: 3

当前的实现每个物理分区使用1个租约文档。

并行化基于租约。如果租约存储有10个租约,它可以分布到最多10个主机,因为每个租约在任何给定时间点只能由单个主机拥有(您可以使用较少的主机,租约将平均分布)。

如果您的租约存储包含1个租约(因为集合只有1个物理分区),那么您最多可以在1个主机上监听更改。添加更多主机将使额外的主机处于闲置状态。如果由于使用更多存储空间而使集合增长,例如,动态添加新的物理分区,将会动态添加新的租约,并且额外的实例可以开始自动拾取它们(如果可用的实例多于租约)。该库会自动在两个维度发生变化时动态分发租约到主机上。

英文:

Current implementation uses 1 lease document per physical partition.

Parallelization is based off leases. If the lease store has 10 leases, it can be distributed up to 10 hosts because 1 lease can only be owned by a single host at any given point in time (you can use less hosts and leases will be equally distributed).

If your lease store contains 1 lease (because the collection has 1 physical partition), then you can listen for changes in up to 1 host. Adding more hosts will just make the extra sit idle. If the collection grows due to more storage being used, for example, and new physical partitions are dynamically added, new leases will be dynamically added and the extra instances can start to automatically pick them up (if more instances than leases are available). The library does the dynamic distribution of leases over hosts automatically as both dimensions change.

huangapple
  • 本文由 发表于 2020年8月7日 06:55:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/63292861.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定