Efficient paging in MongoDB using mgo

huangapple go评论104阅读模式
英文:

Efficient paging in MongoDB using mgo

问题

我已经搜索过了,没有找到关于这个问题的Go语言解决方案,无论是使用mgo.v2还是不使用它,也没有在StackOverflow或其他任何网站上找到相关的解答。这个问题是为了分享知识和记录而提出的。

假设我们在MongoDB中有一个名为users的集合,并使用以下Go语言的struct进行建模:

type User struct {
    ID      bson.ObjectId `bson:"_id"`
    Name    string        `bson:"name"`
    Country string        `bson:"country"`
}

我们想要根据某些条件对用户进行排序和列表,并且由于预期结果列表很长,需要实现分页。

为了实现对某个查询结果的分页,MongoDB和mgo.v2驱动程序包提供了内置支持,可以使用Query.Skip()Query.Limit()方法,例如:

session, err := mgo.Dial(url) // 获取Mongo会话,处理错误!

c := session.DB("").C("users")
q := c.Find(bson.M{"country": "USA"}).Sort("name", "_id").Limit(10)

// 获取第n页:
q = q.Skip((n-1)*10)

var users []*User
err = q.All(&users)

然而,如果页码增加,这种方法会变得很慢,因为MongoDB不能像“魔法”一样跳转到结果中的第x个文档,它必须遍历所有结果文档,并省略(不返回)需要跳过的前x个文档。

MongoDB提供了正确的解决方案:如果查询操作基于索引(必须基于索引进行操作),可以使用cursor.min()来指定从哪个索引条目开始列出结果。

这个Stack Overflow的答案展示了如何使用mongo客户端来实现:https://stackoverflow.com/questions/5525304/how-to-do-pagination-using-range-queries-in-mongodb/5526907#5526907

注意:上述查询所需的索引应为:

db.users.createIndex(
    {
        country: 1,
        name: 1,
        _id: 1
    }
)

然而,有一个问题:mgo.v2包不支持指定min()方法。

我们如何使用mgo.v2驱动程序实现使用MongoDB的cursor.min()特性的高效分页呢?

英文:

<sup>I've searched and found no Go solution to the problem, not with or without using mgo.v2, not on StackOverflow and not on any other site. This Q&A is in the spirit of knowledge sharing / documenting.</sup>


Let's say we have a users collection in MongoDB modeled with this Go struct:

type User struct {
    ID      bson.ObjectId `bson:&quot;_id&quot;`
    Name    string        `bson:&quot;name&quot;`
    Country string        `bson:&quot;country&quot;`
}

We want to sort and list users based on some criteria, but have paging implemented due to the expected long result list.

To achieve paging of the results of some query, MongoDB and the mgo.v2 driver package has built-in support in the form of Query.Skip() and Query.Limit(), e.g.:

session, err := mgo.Dial(url) // Acquire Mongo session, handle error!

c := session.DB(&quot;&quot;).C(&quot;users&quot;)
q := c.Find(bson.M{&quot;country&quot; : &quot;USA&quot;}).Sort(&quot;name&quot;, &quot;_id&quot;).Limit(10)

// To get the nth page:
q = q.Skip((n-1)*10)

var users []*User
err = q.All(&amp;users)

This however becomes slow if the page number increases, as MongoDB can't just "magically" jump to the x<sup>th</sup> document in the result, it has to iterate over all the result documents and omit (not return) the first x that need to be skipped.

MongoDB provides the right solution: If the query operates on an index (it has to work on an index), cursor.min() can be used to specify the first index entry to start listing results from.

This Stack Overflow answer shows how it can be done using a mongo client: https://stackoverflow.com/questions/5525304/how-to-do-pagination-using-range-queries-in-mongodb/5526907#5526907

Note: the required index for the above query would be:

db.users.createIndex(
    {
        country: 1,
        name: 1,
        _id: 1
    }
)

There is one problem though: the mgo.v2 package has no support specifying this min().

How can we achieve efficient paging that uses MongoDB's cursor.min() feature using the mgo.v2 driver?

答案1

得分: 21

很遗憾,mgo.v2驱动程序不提供API调用来指定cursor.min()

但是有一个解决方案。mgo.Database类型提供了一个Database.Run()方法来运行任何MongoDB命令。可用的命令及其文档可以在这里找到:数据库命令

从MongoDB 3.2开始,有一个新的find命令可用于执行查询,并支持指定min参数,该参数表示要从哪个索引条目开始列出结果。

好的。我们需要做的是在每个批次(一个页面的文档)之后,从查询结果的最后一个文档生成min文档,该文档必须包含用于执行查询的索引条目的值,然后可以通过在执行查询之前设置此最小索引条目来获取下一个批次(下一页的文档)。

这个索引条目 - 让我们从现在开始称之为_cursor_ - 可以被编码为一个string并与结果一起发送给客户端,当客户端想要下一页时,他发送回_cursor_,表示他希望从此cursor之后开始获取结果。

手动实现("困难"的方式)

要执行的命令可以有不同的形式,但命令名称(find)必须首先出现在编组结果中,因此我们将使用bson.D(与bson.M相比保留顺序):

limit := 10
cmd := bson.D{
    {Name: "find", Value: "users"},
    {Name: "filter", Value: bson.M{"country": "USA"}},
    {Name: "sort", Value: []bson.D{
        {Name: "name", Value: 1},
        {Name: "_id", Value: 1},
    }},
    {Name: "limit", Value: limit},
    {Name: "batchSize", Value: limit},
    {Name: "singleBatch", Value: true},
}
if min != nil {
    // min是包含的,必须跳过第一个(即上一个最后一个)
    cmd = append(cmd,
        bson.DocElem{Name: "skip", Value: 1},
        bson.DocElem{Name: "min", Value: min},
    )
}

使用Database.Run()执行MongoDB find命令的结果可以使用以下类型捕获:

var res struct {
    OK       int `bson:"ok"`
    WaitedMS int `bson:"waitedMS"`
    Cursor   struct {
        ID         interface{} `bson:"id"`
        NS         string      `bson:"ns"`
        FirstBatch []bson.Raw  `bson:"firstBatch"`
    } `bson:"cursor"`
}

db := session.DB("")
if err := db.Run(cmd, &res); err != nil {
    // 处理错误(中止)
}

现在我们有了结果,但是是[]bson.Raw类型的切片。但我们希望它是[]*User类型的切片。这就是Collection.NewIter()派上用场的地方。它可以将[]bson.Raw类型的值转换(解组)为我们通常传递给Query.All()Iter.All()的任何类型。好的。让我们看看:

firstBatch := res.Cursor.FirstBatch
var users []*User
err = db.C("users").NewIter(nil, firstBatch, 0, nil).All(&users)

现在我们有了下一页的用户。只剩下一件事:生成用于获取后续页面的cursor(如果需要的话):

if len(users) > 0 {
    lastUser := users[len(users)-1]
    cursorData := []bson.D{
        {Name: "country", Value: lastUser.Country},
        {Name: "name", Value: lastUser.Name},
        {Name: "_id", Value: lastUser.ID},
    }
} else {
    // 找不到更多用户,使用最后一个cursor
}

这就是全部,但是如何将cursorData转换为string,反之亦然?我们可以使用bson.Marshal()bson.Unmarshal()结合base64编码;使用base64.RawURLEncoding将给我们一个Web安全的cursor字符串,可以将其添加到URL查询中而无需转义。

这是一个示例实现:

// CreateCursor从指定的字段返回一个Web安全的cursor字符串。
// 返回的cursor字符串可以安全地包含在URL查询中而无需转义。
func CreateCursor(cursorData bson.D) (string, error) {
    // bson.Marshal()永远不会返回错误,所以我跳过了检查和早期返回
    // (但如果它曾经发生,我会返回错误)
    data, err := bson.Marshal(cursorData)
    return base64.RawURLEncoding.EncodeToString(data), err
}

// ParseCursor解析cursor字符串并返回cursor数据。
func ParseCursor(c string) (cursorData bson.D, err error) {
    var data []byte
    if data, err = base64.RawURLEncoding.DecodeString(c); err != nil {
        return
    }

    err = bson.Unmarshal(data, &cursorData)
    return
}

最后,我们拥有了高效但不太简短的MongoDB mgo分页功能。继续阅读...

使用github.com/icza/minquery("简单"的方式)

手动方式相当冗长;它可以被_通用_和_自动化_。这就是github.com/icza/minquery的作用(声明:我是作者)。它提供了一个包装器来配置和执行MongoDB find命令,允许您指定一个cursor,在执行查询之后,它会返回新的cursor,用于查询下一批结果。包装器是MinQuery类型,它与mgo.Query非常相似,但它支持通过MinQuery.Cursor()方法指定MongoDB的min

使用minquery的上述解决方案如下:

q := minquery.New(session.DB(""), "users", bson.M{"country": "USA"}).
    Sort("name", "_id").Limit(10)
// 如果这不是第一页,请设置cursor:
// getLastCursor()表示您获取最后一个cursor的逻辑。
if cursor := getLastCursor(); cursor != "" {
    q = q.Cursor(cursor)
}

var users []*User
newCursor, err := q.All(&users, "country", "name", "_id")

就是这样。newCursor是用于获取下一批结果的cursor。

**注意1:**在调用MinQuery.All()时,您必须提供cursor字段的名称,这将用于从中构建cursor数据(最终构建cursor字符串)。

**注意2:**如果您正在检索部分结果(通过使用MinQuery.Select()),则必须包括所有作为cursor(索引条目)的一部分的字段,即使您不打算直接使用它们,否则MinQuery.All()将不会具有所有cursor字段的值,因此它将无法创建正确的cursor值。

在这里查看minquery的包文档:https://godoc.org/github.com/icza/minquery,它相当简短,希望干净。

英文:

Unfortunately the mgo.v2 driver does not provide API calls to specify cursor.min().

But there is a solution. The mgo.Database type provides a Database.Run() method to run any MongoDB commands. The available commands and their documentation can be found here: Database commands

Starting with MongoDB 3.2, a new find command is available which can be used to execute queries, and it supports specifying the min argument that denotes the first index entry to start listing results from.

Good. What we need to do is after each batch (documents of a page) generate the min document from the last document of the query result, which must contain the values of the index entry that was used to execute the query, and then the next batch (the documents of the next page) can be acquired by setting this min index entry prior to executing the query.

This index entry –let's call it cursor from now on– may be encoded to a string and sent to the client along with the results, and when the client wants the next page, he sends back the cursor saying he wants results starting after this cursor.

Doing it manually (the "hard" way)

The command to be executed can be in different forms, but the command name (find) must be first in the marshaled result, so we'll use bson.D (which preserves order in contrast to bson.M):

limit := 10
cmd := bson.D{
    {Name: &quot;find&quot;, Value: &quot;users&quot;},
    {Name: &quot;filter&quot;, Value: bson.M{&quot;country&quot;: &quot;USA&quot;}},
    {Name: &quot;sort&quot;, Value: []bson.D{
        {Name: &quot;name&quot;, Value: 1},
        {Name: &quot;_id&quot;, Value: 1},
    },
    {Name: &quot;limit&quot;, Value: limit},
    {Name: &quot;batchSize&quot;, Value: limit},
    {Name: &quot;singleBatch&quot;, Value: true},
}
if min != nil {
    // min is inclusive, must skip first (which is the previous last)
    cmd = append(cmd,
        bson.DocElem{Name: &quot;skip&quot;, Value: 1},
        bson.DocElem{Name: &quot;min&quot;, Value: min},
    )
}

The result of executing a MongoDB find command with Database.Run() can be captured with the following type:

var res struct {
    OK       int `bson:&quot;ok&quot;`
    WaitedMS int `bson:&quot;waitedMS&quot;`
    Cursor   struct {
        ID         interface{} `bson:&quot;id&quot;`
        NS         string      `bson:&quot;ns&quot;`
        FirstBatch []bson.Raw  `bson:&quot;firstBatch&quot;`
    } `bson:&quot;cursor&quot;`
}

db := session.DB(&quot;&quot;)
if err := db.Run(cmd, &amp;res); err != nil {
    // Handle error (abort)
}

We now have the results, but in a slice of type []bson.Raw. But we want it in a slice of type []*User. This is where Collection.NewIter() comes handy. It can transform (unmarshal) a value of type []bson.Raw into any type we usually pass to Query.All() or Iter.All(). Good. Let's see it:

firstBatch := res.Cursor.FirstBatch
var users []*User
err = db.C(&quot;users&quot;).NewIter(nil, firstBatch, 0, nil).All(&amp;users)

We now have the users of the next page. Only one thing left: generating the cursor to be used to get the subsequent page should we ever need it:

if len(users) &gt; 0 {
    lastUser := users[len(users)-1]
    cursorData := []bson.D{
        {Name: &quot;country&quot;, Value: lastUser.Country},
        {Name: &quot;name&quot;, Value: lastUser.Name},
        {Name: &quot;_id&quot;, Value: lastUser.ID},
    }
} else {
    // No more users found, use the last cursor
}

This is all good, but how do we convert a cursorData to string and vice versa? We may use bson.Marshal() and bson.Unmarshal() combined with base64 encoding; the use of base64.RawURLEncoding will give us a web-safe cursor string, one that can be added to URL queries without escaping.

Here's an example implementation:

// CreateCursor returns a web-safe cursor string from the specified fields.
// The returned cursor string is safe to include in URL queries without escaping.
func CreateCursor(cursorData bson.D) (string, error) {
    // bson.Marshal() never returns error, so I skip a check and early return
    // (but I do return the error if it would ever happen)
    data, err := bson.Marshal(cursorData)
    return base64.RawURLEncoding.EncodeToString(data), err
}

// ParseCursor parses the cursor string and returns the cursor data.
func ParseCursor(c string) (cursorData bson.D, err error) {
    var data []byte
    if data, err = base64.RawURLEncoding.DecodeString(c); err != nil {
    	return
    }

    err = bson.Unmarshal(data, &amp;cursorData)
    return
}

And we finally have our efficient, but not so short MongoDB mgo paging functionality. Read on...

Using github.com/icza/minquery (the "easy" way)

The manual way is quite lengthy; it can be made general and automated. This is where github.com/icza/minquery comes into the picture (disclosure: I'm the author). It provides a wrapper to configure and execute a MongoDB find command, allowing you to specify a cursor, and after executing the query, it gives you back the new cursor to be used to query the next batch of results. The wrapper is the MinQuery type which is very similar to mgo.Query but it supports specifying MongoDB's min via the MinQuery.Cursor() method.

The above solution using minquery looks like this:

q := minquery.New(session.DB(&quot;&quot;), &quot;users&quot;, bson.M{&quot;country&quot; : &quot;USA&quot;}).
    Sort(&quot;name&quot;, &quot;_id&quot;).Limit(10)
// If this is not the first page, set cursor:
// getLastCursor() represents your logic how you acquire the last cursor.
if cursor := getLastCursor(); cursor != &quot;&quot; {
    q = q.Cursor(cursor)
}

var users []*User
newCursor, err := q.All(&amp;users, &quot;country&quot;, &quot;name&quot;, &quot;_id&quot;)

And that's all. newCursor is the cursor to be used to fetch the next batch.

Note #1: When calling MinQuery.All(), you have to provide the names of the cursor fields, this will be used to build the cursor data (and ultimately the cursor string) from.

Note #2: If you're retrieving partial results (by using MinQuery.Select()), you have to include all the fields that are part of the cursor (the index entry) even if you don't intend to use them directly, else MinQuery.All() will not have all the values of the cursor fields, and so it will not be able to create the proper cursor value.

Check out the package doc of minquery here: https://godoc.org/github.com/icza/minquery, it is rather short and hopefully clean.

huangapple
  • 本文由 发表于 2016年11月16日 22:38:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/40634865.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定