Cosmos .NET SDK 批量执行而不超出预配的吞吐量

huangapple go评论72阅读模式
英文:

Cosmos .NET SDK Bulk execution without exceeded provisioned throughput

问题

我有一个手动配置吞吐量的NoSQL Cosmos DB容器。

我有一个应用程序,使用启用了批量执行模式的v3 .NET SDK来一次性更新许多文档。

我在应用程序级别遇到了429(请求过多)错误。我该如何避免这种情况?

我正在使用批量执行模式,因为我希望尽快更新文档。但当然,我不想超出我手动配置的吞吐量。我该怎么做?

这篇文章提到批量执行模式会增加最大RU消耗。

英文:

I have a NoSQL Cosmos DB Container with manually provisioned throughput.

I have an application that upserts a lot of documents at once using the v3 .NET SDK for cosmos DB with bulk execution mode enabled.

I am having 429 (Too Many Request) errors surfaced to the application level. How can I avoid this?

I am using bulk execution mode because I want to upsert the documents as quickly as possible. But of course I don't want to exceed my manually provisioned throughput. What can I do?

This article mentions that max RU consumption increases with bulk execution mode.

答案1

得分: 1

总消耗的 RU 不会增加,但因为您每秒发送更多的操作,所以每秒的 RU 会增加,因为目标是消除客户端瓶颈,让您利用可用的 RU 进行摄取。

目前还没有一种方法来定义 RU 限制,您是通过批量处理的文档大小/数量来定义 RU 消耗的,因此您可以定义一个数字,并根据您的 RU 使用情况创建您感到舒适的发送批次的整个数据的子集。

英文:

The total RU consumed does not increase, but because you are sending more operations per second then the RU per second increases, because the goal is to remove the client-side bottleneck and allow you to take advantage of available RU for ingestion.

At the moment there is no way to define an RU limit, you are defining the RU consumption by the size/number of documents you are processing through Bulk, so you can define a number and create subsets of your whole data that are in batches of a number you feel comfortable sending based on your RU usage.

答案2

得分: 0

我正在使用批量执行模式,因为我希望尽快地执行文档的插入或更新。

如果你收到 429 错误,那么你已经达到了给定 RU 限制下的“尽快执行”的目标。 请参阅 诊断和排除 Azure Cosmos DB 请求速率过高 (429) 异常

通常情况下,对于生产工作负载,如果你看到 1-5% 的请求收到 429 响应,而且端到端的延迟是可接受的,这表明 RU/s 被充分利用,无需采取任何措施。

所以,429 错误并不一定是坏事,除非你的客户端继续尝试同时推送过多的文档,超过了 RU 限制并完全超过了重试机制,导致 429 错误变成了失败的业务操作。那就是一个不好的迹象。

内置的重试机制应该能够通过在重试时自动尊重 x-ms-retry-after-ms 头 来避免硬性失败:
> 在初始操作收到 HTTP 状态码 429 并被限制后,重试操作之前等待的毫秒数。

正如 Matias 已经提到的,关键是通过在客户端添加批处理并在每个批处理之后等待,以赶上 RU 限制,然后再发送下一批文档,帮助你的应用程序避免堆积过多的请求。这样,自动重试机制将继续使用完整的 RU 限制,业务方面如果需要等待则会减速,最终所有文档都会被发送。

另外,增加 RU 限制以匹配你的插入速率也有帮助;) 1: https://learn.microsoft.com/en-us/rest/api/cosmos-db/common-cosmosdb-rest-response-headers

英文:

> I am using bulk execution mode because I want to upsert the documents as quickly as possible.

If you get 429s then you are reaching your goal of "as quickly as possible" for given RU limit. See Diagnose and troubleshoot Azure Cosmos DB request rate too large (429) exceptions:

> In general, for a production workload, if you see between 1-5% of requests with 429 responses, and your end to end latency is acceptable, this is a healthy sign that the RU/s are being fully utilized. No action is required.

So, 429s are not necessarily a bad thing unless you client continues to try to push in too many docs at the same time, overwhelms the RU limit and retry-mechanism completely and 429s turn to failed business operations. Then its a bad sign.

The built-in retry mechanism should be able to avoid hard fails by respecting the x-ms-retry-after-ms header automatically on retries:
> The number of milliseconds to wait to retry the operation after an initial operation received HTTP status code 429 and was throttled.

As already mentioned by Matias the key is to help your app avoid piling up too many requests by adding batching to the client and waiting after each batch to catch up with RU limit, before sending in the next pile. This way auto-retry mechanism keeps on using full RU-limit, business side slows down if waits are needed, and eventually all docs get sent.

Though, increasing RU limit to match you ingestion rate also helps;)

huangapple
  • 本文由 发表于 2023年8月4日 08:51:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76832344.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定