2021年8月27日 06:07:02go评论97阅读模式

英文:

UpdateOne fails on client due to timeout, but MongoDB processes it anyway

问题

我正在为使用Go的MongoDB驱动程序执行递增操作的函数进行测试，但是测试结果随机出现了意外的错误。以下是测试的步骤：

创建一个代理（使用toxiproxy）到本地MongoDB实例。
禁用代理，使得数据库看起来像是宕机了。
运行一个函数，该函数执行一个递增字段的更新操作，并在100毫秒后超时。如果失败，它会每隔100毫秒重试一次，直到命令成功为止。
等待1秒。
启用代理。
等待函数完成，并断言字段已经正确递增了一次。

这个测试随机出现错误，因为有时候该字段会递增两次。我注意到，当代理刚刚启用时，更新操作正在重试：客户端代码会收到一个incomplete read of message header: context deadline exceeded错误，这会导致它重试命令，但实际上之前的命令已经成功执行了，因此字段最终递增了两次。

我查看了驱动程序的代码，猜测它在读取服务器响应时超时了 - 可能是因为在更新操作开始后刚好启用了代理，导致写入和读取操作的timeout时间不足以完成。

我是否可以在我的代码中做些什么来防止这种情况发生？我尝试找到特定的错误进行捕捉，但没有找到任何错误。或者这是驱动程序本身应该处理的事情吗？

非常感谢您的帮助。

更新：我仔细查看了错误消息，并注意到，在MongoDB实例宕机时，所有错误都是握手失败。因此，我确保在禁用代理之前，测试会对数据库进行ping操作，以便先完成握手，这样测试就不再随机出错了；它至少连续运行了1000次没有问题。我猜测握手本身需要一些时间来完成，这也导致了命令超时的问题。

英文:

One of my tests for a function that performs increments using the MongoDB driver for Go is randomly breaking in an unexpected way. Here's what the test does:

Create a proxy (with toxiproxy) to a local MongoDB instance.
Disable the proxy, so the database looks like it's down.
Run a function that does an update that increments a field, timing out after 100ms. If it fails, it keeps retrying every 100ms until the command succeeds.
Sleep 1 second.
Enable the proxy.
Wait for the function to complete and assert that the field has been incremented correctly - only once.

This test is randomly breaking because sometimes that field gets incremented twice. I noticed that it happens when an update is retried just as the proxy gets enabled: the client code receives an incomplete read of message header: context deadline exceeded error, which makes it retry the command, but the previous one indeed succeeded because the field ends up being incremented twice.

I took a look at the driver code and I guess it's timing out while reading the server response - perhaps the proxy is enabled just after the update has started and there isn't much timeout left for both write and read operations to complete.

Is there anything that I can do on my side to prevent this from happening? I tried to find a specific error to catch, but I couldn’t find any. Or is this something the driver itself is supposed to handle?

Any help is appreciated.

UPDATE: I looked closely at the error messages and noticed that, while the MongoDB instance was down, all errors were handshake failures. So I made sure the test ping the database before disabling the proxy to get the handshake out of the way and the test stopped randomly breaking; it ran 1000 times flawlessly, at least. I assume the handshake itself takes time to complete and that contributes to the command timeout.

答案1

得分: 1

通常情况下，如果你知道命令已经发送到服务器，但无法读取响应，你不能对其成功与否做任何假设。

在某些情况下，只要服务器接收到了命令，或者你只关心命令是否成功发送到服务器，那么请继续阅读。

不幸的是，当前驱动程序（v1.7.1）的状态不够“复杂”，无法轻松判断错误是否来自于读取响应。

我能够在本地重现你的问题。当读取响应超时时，以下是错误信息：

mongo.CommandError{Code:0, Message:"connection(localhost:27017[-30]) incomplete read of message header: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-30]", Wrapped:context.deadlineExceededError{}, init:false, message:"incomplete read of message header"}}

当写入命令超时时，以下是错误信息：

mongo.CommandError{Code:0, Message:"connection(localhost:27017[-31]) unable to write wire message to network: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-31]", Wrapped:context.deadlineExceededError{}, init:false, message:"unable to write wire message to network"}}

如你所见，在这两种情况下，都返回了mongo.CommandError，具有相同的Code和Labels字段。这意味着你需要分析错误字符串（这可能很难看，并且可能在将来的更改中“破坏”）。

因此，你最好的做法是检查错误字符串是否包含“incomplete read of message header”，如果是，则无需重试。希望这方面（错误支持和分析）在将来会有所改进。

英文:

In general, if you know the command went through (to the server), if you can't read the response, you can't assume anything about its success.

In some cases when it only matters if the server got the command, or you only care about the command reaching the server, then read on.

Unfortunately the current state of the driver (v1.7.1) is not "sophisticated" enough to easily tell if the error is from reading the response.

I was able to reproduce your issue locally. Here is the error when a timeout happens reading the response:

> mongo.CommandError{Code:0, Message:"connection(localhost:27017[-30]) incomplete read of message header: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-30]", Wrapped:context.deadlineExceededError{}, init:false, message:"incomplete read of message header"}}

And there is the error when the timeout happens writing the command:

> mongo.CommandError{Code:0, Message:"connection(localhost:27017[-31]) unable to write wire message to network: context deadline exceeded", Labels:[]string{"NetworkError", "RetryableWriteError"}, Name:"", Wrapped:topology.ConnectionError{ConnectionID:"localhost:27017[-31]", Wrapped:context.deadlineExceededError{}, init:false, message:"unable to write wire message to network"}}

As you can see, in both cases mongo.CommandError is returned, with identical Code and Labels fields. Which leaves you having to analyze the error string (which is ugly and may "break" with future changes).

So the best you can do is check if the error string contains "incomplete read of message header", and if so, you don't have to retry. Hopefully this (error support and analysis) improves in the future.

答案2

得分: 1

如果您正在使用MongoDB 3.6+及其相应的驱动程序实现的可重试写入功能，这种情况不应该发生。每次写入都会附带一个事务号（不要与MongoDB 4.0+实现的客户端事务混淆），如果在两次连续写入中使用相同的事务号，则服务器只会执行一次写入操作。

这个功能已经存在多年了，所以除非您使用的是古老的驱动程序版本，否则您应该已经拥有它。

如果您在应用程序中手动执行写入重试而不是使用驱动程序的可重试写入功能，那么您可能会发现会写入两次。解决方案是使用驱动程序的可重试写入功能。

英文:

If you are using the retryable writes as implemented by MongoDB 3.6+ and the respective drivers, this shouldn't happen. Each write is accompanied by a transaction number (not to be confused with client-side transactions as implemented by MongoDB 4.0+), and if the same transaction number is used in two consecutive writes there is only one write being done by the server.

This functionality has been around for years so unless you are using an ancient driver version you should already have it.

If you are performing write retries in your application manually rather than using the driver's retryable write functionality, you can write twice as you found out. The solution is to use the driver's retryable writes.

答案3

得分: 0

我遇到了同样的问题（在MongoDB 4.4上使用go.mongodb.org/mongo-driver v1.8.1），我将在这里分享一下我的经验。

补充一下@icza的解决方案：
你也可能会遇到context deadline exceeded的错误，所以也要检查一下这个错误。
检查上下文中止的代码可能如下所示：

if strings.Contains(err.Error(), "context") && (strings.Contains(err.Error(), " canceled") || strings.Contains(err.Error(), " deadline exceeded")) {
   ...
}

我的解决方案是，不是首先检查是否有错误，而是首先检查事务是否有结果。

示例：

result, err := database.collection.InsertOne(context, item)
if result != nil {
	return result.InsertedID, err
}
return nil, err

如果事务在出现错误的情况下仍然进行了处理，你可以添加一些补偿逻辑来撤销事务。

英文:

I had the same problem (running on go.mongodb.org/mongo-driver v1.8.1 on a MongoDB 4.4) and will leave my experiences with this problem here.

To add to @icza solution:
You can also get the error context deadline exceeded so check also for that.
A check for a context abortion would look something like this:

if strings.Contains(err.Error(), &quot;context&quot;) &amp;&amp; (strings.Contains(err.Error(), &quot; canceled&quot;) || strings.Contains(err.Error(), &quot; deadline exceeded&quot;)) {
   ...
}

My solution to the problem was instead of first checking if there was an error you'd first check if there was a result from the transaction.

Example:

result, err := database.collection.InsertOne(context, item)
if result != nil {
	return result.InsertedID, err
}
return nil, err

If the transaction did process it despite the error, you could add some compensation logic to undo the transaction.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

UpdateOne在客户端上由于超时而失败，但MongoDB仍然会处理它。

问题

答案1

答案2

答案3

加速使用Go的问题

How do I sort a slice of slices of something (how do I compare two slices) in Go

制作一个可以从其他Go应用程序调用的守护进程。

处理Gorilla mux空变量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论