分页与批量查询?从数据存储中批量获取并获取游标是否可行?

huangapple go评论82阅读模式
英文:

Pagination with batch queries? Is it possible to batch gets from the datastore and get a cursor?

问题

我目前正在请求从数据存储中获取20个条目,并将它们以游标的形式返回给用户。如果用户要求更多条目,可以使用游标作为新的起点,并请求下一个20个条目。

代码大致如下:

q := datastore.NewQuery("Item").
	Limit(limit)

if cursor, err := datastore.DecodeCursor(cursor); err == nil {
	q = q.Start(cursor)
}

var is []Item
t := q.Run(c)
for {
	var i Item
	_, err := t.Next(&i)
	if err == datastore.Done {
		break
	}

	is = append(is, i)
}

如果重要的话,这是完整的代码:https://github.com/koffeinsource/kaffeeshare/blob/master/data/appengine.go#L23

使用循环和append看起来像是一种反模式,但我没有找到在使用GetMulti/GetAll时获取游标的方法,或者我有什么遗漏吗?

我预计在用户查询数据存储时会添加数据,因此使用偏移量可能会产生重复的结果。在这种情况下,我应该关心批量获取吗?

英文:

I am currently requesting 20 entries from the datastore, return these to the user with a cursor and in case the user is asking for more entries use the cursor as a new start and ask for the next 20 entries.

The code looks something like

q := datastore.NewQuery("Item").
	Limit(limit)

if cursor, err := datastore.DecodeCursor(cursor); err == nil {
	q = q.Start(cursor)
}

var is []Item
t := q.Run(c)
for {
	var i Item
	_, err := t.Next(&i)
	if err == datastore.Done {
		break
	}

	is = append(is, i)
}

In case it is important here is the complete code: https://github.com/koffeinsource/kaffeeshare/blob/master/data/appengine.go#L23

It looks an anti-pattern to use a loop with an append, but I don't see a way to get a cursor when using GetMulti/GetAll or am I missing something?

I do expect a data being added while users are querying the datastore, so an offset may produce duplicate results. Should I care about batching gets in this case?

答案1

得分: 1

你的方法是完全正确的,事实上,这是在AppEngine上最好的方式。

通过设置起始游标来查询后续实体,如果插入了一个新记录,它将成为第一个记录,不会导致重复结果。

为什么呢?因为游标包含了编码的_上一个返回实体的键_,而不是之前返回的实体数量。

因此,如果你设置了一个游标,数据存储将开始列出并返回在游标中编码的键之后的实体。如果保存了一个在游标之后的实体,那么当到达该实体时,它将被返回。

此外,使用forappend()是最好的方式。你可以通过事先创建一个足够大的切片来进行优化:

var is = make([]Item, 0, limit)

但请注意,我故意将其长度设置为0,容量设置为limit:不能保证会有足够的实体来填充整个切片。

另一个优化方法是将其分配为limit长度:

var is = make([]Item, limit)

当达到datastore.Done时,如果没有完全填充,可以重新切片,例如:

for idx := 0; ; idx++ {
    var i Item
    _, err := t.Next(&i)
    if err == datastore.Done {
        if idx < len(is) {
            is = is[:idx] // 重新切片,因为没有完全填充
        }
        break
    }

    is[idx] = i
}

批量操作

GetMultiPutMultiDeleteMultiGetPutDelete函数的批量版本。它们接受[]*Key而不是*Key,在遇到部分失败时可能返回appengine.MultiError

批量操作不是查询的替代或替代方案。例如,GetMulti要求你已经准备好了要获取完整实体的所有键。因此,对于这些批量操作,没有游标的概念。

批量操作会返回所有请求的信息(或执行所有请求的操作)。没有实体或操作的顺序可以被终止和稍后继续。

查询和批量操作是用于不同的目的。你不必担心查询和游标的性能。它们表现得很好,而且重要的是,它们(数据存储)的扩展性很好。游标不会减慢查询的执行速度,带有游标的查询与不带游标的查询一样快,先前返回的实体也不会影响查询的执行时间:无论是在没有游标的查询还是在获取一百万个实体后获得游标后运行查询(这只有在多次迭代时才可能)。

英文:

Your approach is perfectly fine, in fact, it is the best way on AppEngine.

Querying subsequent entities by setting a start cursor will not give you duplicate results if a new record is inserted which would be the first for example.

Why? Because the cursor contains the key of the last returned entity encoded, and not the number of previously returned entities.

So if you set a cursor, the datastore will start listing and returning entities that come after the key encoded in the cursor. If a new entity is saved that comes after the cursor, then that entity will be returned when reached.

Also using the for and append() is the best way. You might optimize it a little by creating a big enough slice beforehand:

var is = make([]Item, 0, limit)

But note that I made it with 0 length and limit capacity on purpose: there is no guarantee that there will be enough entities to fill the full slice.

Another optimization would be to allocate it to be limit length:

var is = make([]Item, limit)

and when datastore.Done is reached, reslice it if it is not filled fully, for example:

for idx := 0; ; idx++ {
    var i Item
    _, err := t.Next(&amp;i)
    if err == datastore.Done {
        if idx &lt; len(is) {
            is = is[:idx] // Reslice as it is not filled fully
        }
        break
    }

    is[idx] = i
}

Batch operations

> GetMulti, PutMulti and DeleteMulti are batch versions of the Get, Put and Delete functions. They take a []*Key instead of a *Key, and may return an appengine.MultiError when encountering partial failure.

Batch operations are not a replacement or alternative to queries. GetMulti for example requires you to already have all the keys prepared for which you want to get the complete entities. And as such, there is no sense of a cursor for these batch operations.

Batch operations return you all the requested information (or do all the requested operation). There is no sequence of entities or operations which would/could be terminated and continued later on.

Queries and batch operations are for different things. You shouldn't worry about query and cursor performance. They do quite good, and what's important, they (the Datastore) scale good. A cursor will not slow the execution of a query, a query with a cursor will run just as fast as a query without a cursor, and also previously returned entities do not affect query execution time: it doesn't matter if you run a query without a cursor or with a cursor which you acquired after getting a million entities (which is only possible with several iterations).

huangapple
  • 本文由 发表于 2015年8月9日 17:46:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/31902903.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定