当使用集群化的Mongo实例时,出现了MongoDB的I/O超时问题。

huangapple go评论77阅读模式
英文:

mongodb i/o timeout when using clustered mongo instances

问题

我有一个应用程序,它使用upper.io/db包与Mongo数据库服务器进行通信(这是一个相对简单的gopkg.in/mgo.v2的封装)。应用程序的工作方式是在启动时在主线程中创建一个会话,然后每个需要向Mongo服务器发出请求的单独的go例程都会在会话上调用Clone,并在返回的值上执行defer session.Close。据我所知,这都是标准的操作流程。

这个设置在我们的开发环境中没有任何错误,我们要么使用本地运行的MongoDB,要么使用MongoLab上的沙盒实例。最近,我们将应用程序升级到了我们的分段环境,我们在那里将应用程序与MongoLab上的共享集群实例进行通信(最便宜的15美元选项)。奇怪的事情就从这里开始发生了。第一个通过的请求(从第一个调用的go例程)返回了预期的响应,但随后的请求都返回了

 read tcp <ip address>:47112: i/o timeout

这在我们的本地开发机器指向集群或者在分段环境的AWS主机上都发生。由于Mongo集群来自Mongolabs,我会假设他们在他们的端配置了一切正确。

代码有点无聊:它只是在主函数中打开会话并保持对它的引用,然后有多个具有以下基本结构的goroutine:

   sess := session.Clone()
   defer sess.Close()

   // 向Mongo发出请求

在测试期间,我甚至限制它一次只运行一件事情(即任何给定时间只有一个goroutine处于活动状态),但它仍然以相同的方式失败。

有人遇到过这种情况吗?我需要以特定方式配置upper.io/db吗?也许直接使用mgo?我对此感到非常困惑。

英文:

I have an application that is using the upper.io/db package for communication with a Mongo database server (which is a fairly simple wrapper around gopkg.in/mgo.v2). The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.

This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return

 read tcp <ip address>:47112: i/o timeout

This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.

The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:

   sess := session.Clone()
   defer sess.Close()

   // make requests to Mongo

During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.

Has anybody run into this before? Do I need to configure upper.io/db in a specific fashion? Maybe use mgo directly? I am at my wits end with this 当使用集群化的Mongo实例时,出现了MongoDB的I/O超时问题。

答案1

得分: 1

在一个相当漫长而艰难的过程中,我们最终追踪到了我们程序中这个问题及类似问题的来源。结果发现这是在v1版本的upper.io/db库中存在的一个会话泄漏问题。该bug和修复方法在这里有详细说明,但是目前这个库的v1版本已经非常过时了,后续版本没有这个问题。

我怀疑这个答案对于现在的任何人来说都不会有用(尤其是因为我们自己在3年前就解决了这个问题),但只是为了完整性而在这里留下答案。

英文:

In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the upper.io/db library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.

I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.

huangapple
  • 本文由 发表于 2015年9月5日 04:01:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/32405742.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定