Mongodb查询例程产生了巨大的堆栈跟踪。

huangapple go评论93阅读模式
英文:

Mongodb queries on routines produces huge stack trace

问题

我正在一个Go程序中进行大量的Web调用来访问一个API,并将结果存储在数据库中(使用mgo)。这些API调用是在单独的Go协程中完成的。在其他协程中,我从数据库中提取信息并进行处理,然后更新数据库。当数据被放回时,会设置一个标志,以便知道该数据已经进行了后处理,因此当程序请求数据库返回另一个条目进行后处理时,数据库会返回一个标志为false的条目。当标志设置为true时,Go协程会关闭:wg.done()

一切都很好,我有很多打印输出告诉我程序的运行情况,然而在运行结束时,我得到了一个包含很多相同内容的巨大堆栈跟踪:

> goroutine 56731 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601b420, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162
>
> goroutine 56698 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601bce0, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162
>
> goroutine 56699 [sleep]: time.Sleep(0x1dcd6500)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoCluster).syncServersLoop(0xc8256425a0)
> /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:353 +0x2b1 created by
> gopkg.in/mgo%2ev2.newCluster
> /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:73 +0x1a0
>
> goroutine 56738 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82606fa40, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162

在所有这些内容中,有一件事与其他内容不同,它是堆栈跟踪中唯一不同的输出(上面只是一个示例,我的终端无法滚动到开头,因为有太多内容):

> goroutine 57201 [IO wait]: net.runtime_pollWait(0xedb6f0, 0x72,
> 0xc82000a2c0)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/netpoll.go:157 +0x60
> net.(*pollDesc).Wait(0xc827b0e5a0, 0x72, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:73 +0x3a
> net.(*pollDesc).WaitRead(0xc827b0e5a0, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:78 +0x36
> net.(*netFD).Read(0xc827b0e540, 0xc828d61000, 0x1000, 0x1000, 0x0,
> 0x754050, 0xc82000a2c0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_unix.go:232 +0x23a
> net.(*conn).Read(0xc8260eac38, 0xc828d61000, 0x1000, 0x1000, 0x0, 0x0,
> 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/net.go:172 +0xe4
> net/http.noteEOFReader.Read(0x7960c0, 0xc8260eac38, 0xc82751fd38,
> 0xc828d61000, 0x1000, 0x1000, 0xc82644dc20, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:1370 +0x67
> net/http.(*noteEOFReader).Read(0xc826116e60, 0xc828d61000, 0x1000,
> 0x1000, 0xc827d1a770, 0x0, 0x0) <autogenerated>:126 +0xd0
> bufio.(*Reader).fill(0xc82644d4a0)
> /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:97 +0x1e9
> bufio.(*Reader).Peek(0xc82644d4a0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:132 +0xcc
> net/http.(*persistConn).readLoop(0xc82751fce0)
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:876 +0xf7
> created by net/http.(*Transport).dialConn
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:685 +0xc78

我很难理解它告诉我的是什么,它是在写入数据库时发生了锁定,还是协程没有关闭而导致超时,我不知道。顺便说一下,我正在使用Go 1.5。

与数据库交互的代码如下:

func (l *Ledger) addRaceToDatabase(race Race) { //true if added, 

false if existed
  session, err := mgo.Dial("127.0.0.1")
  if err != nil {
    panic(err)
  }

  defer session.Close()

  session.SetMode(mgo.Monotonic, true)

  c := session.DB("collection").C("races")

    // Index
  index := mgo.Index{
    Key:        []string{"id"},
    Unique:     true,
    DropDups:   true,
    Background: true,
    Sparse:     true,
  }

  err = c.EnsureIndex(index)
  if err != nil {
    panic(err)
  }

  result := Race{}
  //if the race exists, don't add it to the database
  err = c.Find(bson.M{"id": race.ID}).One(&result)
  if err != nil {
  	//if there is an error there wasn't an entry so add this to the database
    err = c.Insert(race)
	if err != nil {
		panic(err)
	}	
  } else {
  	//if it does find an entry, it will print it
  	fmt.Println("FOUND: ", result.ID)
  }
}
英文:

I am doing lots of web calls to an api in a go program, and the results are stored in a database (using mgo). The api calls are done on separate go routines. On other routines, I am pulling the information out of the database and processing it, before updating the database. When the data is put back, a flag is set so that it is known that that data has been post processed so when the program asks the database for another entry to post process the database hands back one where the flag complete is set to false. When the flag is set to true, the go routine is shutdown: wg.done().

All is good, I have lots of print outs telling me how the program is getting on, however towards the end of its running I get a huge stack trace containing lots of the same:

> goroutine 56731 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601b420, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162
>
> goroutine 56698 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601bce0, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162
>
> goroutine 56699 [sleep]: time.Sleep(0x1dcd6500)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoCluster).syncServersLoop(0xc8256425a0)
> /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:353 +0x2b1 created by
> gopkg.in/mgo%2ev2.newCluster
> /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:73 +0x1a0
>
> goroutine 56738 [sleep]: time.Sleep(0x12a05f200)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9
> gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82606fa40, 0x1)
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by
> gopkg.in/mgo%2ev2.newServer
> /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162

There is one thing amongst all of those that is below, which is the only different output on the stack trace to the above (the above is just a sample, my terminal can't scroll back to the beginning there is so many)

> goroutine 57201 [IO wait]: net.runtime_pollWait(0xedb6f0, 0x72,
> 0xc82000a2c0)
> /usr/local/Cellar/go/1.5/libexec/src/runtime/netpoll.go:157 +0x60
> net.(*pollDesc).Wait(0xc827b0e5a0, 0x72, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:73 +0x3a
> net.(*pollDesc).WaitRead(0xc827b0e5a0, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:78 +0x36
> net.(*netFD).Read(0xc827b0e540, 0xc828d61000, 0x1000, 0x1000, 0x0,
> 0x754050, 0xc82000a2c0)
> /usr/local/Cellar/go/1.5/libexec/src/net/fd_unix.go:232 +0x23a
> net.(*conn).Read(0xc8260eac38, 0xc828d61000, 0x1000, 0x1000, 0x0, 0x0,
> 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/net.go:172 +0xe4
> net/http.noteEOFReader.Read(0x7960c0, 0xc8260eac38, 0xc82751fd38,
> 0xc828d61000, 0x1000, 0x1000, 0xc82644dc20, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:1370 +0x67
> net/http.(*noteEOFReader).Read(0xc826116e60, 0xc828d61000, 0x1000,
> 0x1000, 0xc827d1a770, 0x0, 0x0) <autogenerated>:126 +0xd0
> bufio.(*Reader).fill(0xc82644d4a0)
> /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:97 +0x1e9
> bufio.(*Reader).Peek(0xc82644d4a0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0)
> /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:132 +0xcc
> net/http.(*persistConn).readLoop(0xc82751fce0)
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:876 +0xf7
> created by net/http.(*Transport).dialConn
> /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:685 +0xc78

I am struggling to work out what it is telling me, whether its locking on writing to the database, whether the routines aren't closing and something is timing out, I don't know.
I am using go 1.5 btw.

The code that talks with the database is below:

func (l *Ledger) addRaceToDatabase(race Race) { //true if added, 

false if existed
  session, err := mgo.Dial(&quot;127.0.0.1&quot;)
  if err != nil {
    panic(err)
  }

  defer session.Close()

  session.SetMode(mgo.Monotonic, true)

  c := session.DB(&quot;collection&quot;).C(&quot;races&quot;)

    // Index
  index := mgo.Index{
    Key:        []string{&quot;id&quot;},
    Unique:     true,
    DropDups:   true,
    Background: true,
    Sparse:     true,
  }

  err = c.EnsureIndex(index)
  if err != nil {
    panic(err)
  }

  result := Race{}
  //if the race exists, don&#39;t add it to the database
  err = c.Find(bson.M{&quot;id&quot;: race.ID}).One(&amp;result)
  if err != nil {
  	//if there is an error there wasn&#39;t an entry so add this to the database
    err = c.Insert(race)
	if err != nil {
		panic(err)
	}	
  } else {
  	//if it does find an entry, it will print it
  	fmt.Println(&quot;FOUND: &quot;, result.ID)
  }
}

答案1

得分: 0

看起来引用的逻辑每次需要与MongoDB进行交互时都会拨号,这不是在HTTP服务器中持续数据库通信的适当方式。

每次你拨号到MongoDB服务器(或副本集,或mongos),都会启动一些后台活动,以保持集群拓扑在内存中的最新状态,并维护连接池的健康状态(请注意,所有在单个Dial调用之后创建的会话将共享一个连接池)。

每次想要与数据库通信时都调用Dial意味着所有这些逻辑都变得独立,对于每个拨号都是如此,而且在最后一个会话关闭后,后台活动可能需要一段时间才能关闭。

因此,尽管它可以工作,但它非常低效,因为它必须重新了解正在使用的拓扑结构,它无法使用现有连接池中的连接,并且还会创建不必要的后台活动,这就是你在回溯中观察到的情况。

关于如何在这种情况下更好地维护会话的一些建议,请参阅以下主题:

英文:

It looks like the quoted logic is dialing MongoDB every time it needs to do something with it, which isn't the appropriate way to hold sustained database communication within an HTTP server.

Every time you Dial a MongoDB server (or replica set, or mongos), some background activity is started to keep the cluster topology up-to-date in memory, and to maintain the pool of connections sane (note that all sessions that are created after a single Dial call will share a single pool of connections).

Calling Dial every time one wants to talk to the database means all of that logic becomes independent for each of the dials, and that background activity may take a moment to shut down after the last session is closed.

So even though it will work, it's very inefficient as it has to learn again about the topology in use, it fails to use connections from an existing pool, and it also creates unnecessary background churn which is what you are observing in your backtrace.

For some ideas on how to better maintain a session in this context, please see this topic:

huangapple
  • 本文由 发表于 2015年10月19日 07:25:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/33204397.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定