英文:
Pagination with batch queries? Is it possible to batch gets from the datastore and get a cursor?
问题
我目前正在请求从数据存储中获取20个条目,并将它们以游标的形式返回给用户。如果用户要求更多条目,可以使用游标作为新的起点,并请求下一个20个条目。
代码大致如下:
q := datastore.NewQuery("Item").
Limit(limit)
if cursor, err := datastore.DecodeCursor(cursor); err == nil {
q = q.Start(cursor)
}
var is []Item
t := q.Run(c)
for {
var i Item
_, err := t.Next(&i)
if err == datastore.Done {
break
}
is = append(is, i)
}
如果重要的话,这是完整的代码:https://github.com/koffeinsource/kaffeeshare/blob/master/data/appengine.go#L23
使用循环和append
看起来像是一种反模式,但我没有找到在使用GetMulti
/GetAll
时获取游标的方法,或者我有什么遗漏吗?
我预计在用户查询数据存储时会添加数据,因此使用偏移量可能会产生重复的结果。在这种情况下,我应该关心批量获取吗?
英文:
I am currently requesting 20 entries from the datastore, return these to the user with a cursor and in case the user is asking for more entries use the cursor as a new start and ask for the next 20 entries.
The code looks something like
q := datastore.NewQuery("Item").
Limit(limit)
if cursor, err := datastore.DecodeCursor(cursor); err == nil {
q = q.Start(cursor)
}
var is []Item
t := q.Run(c)
for {
var i Item
_, err := t.Next(&i)
if err == datastore.Done {
break
}
is = append(is, i)
}
In case it is important here is the complete code: https://github.com/koffeinsource/kaffeeshare/blob/master/data/appengine.go#L23
It looks an anti-pattern to use a loop with an append
, but I don't see a way to get a cursor when using GetMulti
/GetAll
or am I missing something?
I do expect a data being added while users are querying the datastore, so an offset may produce duplicate results. Should I care about batching gets in this case?
答案1
得分: 1
你的方法是完全正确的,事实上,这是在AppEngine上最好的方式。
通过设置起始游标来查询后续实体,如果插入了一个新记录,它将成为第一个记录,不会导致重复结果。
为什么呢?因为游标包含了编码的_上一个返回实体的键_,而不是之前返回的实体数量。
因此,如果你设置了一个游标,数据存储将开始列出并返回在游标中编码的键之后的实体。如果保存了一个在游标之后的实体,那么当到达该实体时,它将被返回。
此外,使用for
和append()
是最好的方式。你可以通过事先创建一个足够大的切片来进行优化:
var is = make([]Item, 0, limit)
但请注意,我故意将其长度设置为0
,容量设置为limit
:不能保证会有足够的实体来填充整个切片。
另一个优化方法是将其分配为limit
长度:
var is = make([]Item, limit)
当达到datastore.Done
时,如果没有完全填充,可以重新切片,例如:
for idx := 0; ; idx++ {
var i Item
_, err := t.Next(&i)
if err == datastore.Done {
if idx < len(is) {
is = is[:idx] // 重新切片,因为没有完全填充
}
break
}
is[idx] = i
}
批量操作
GetMulti
、PutMulti
和DeleteMulti
是Get
、Put
和Delete
函数的批量版本。它们接受[]*Key
而不是*Key
,在遇到部分失败时可能返回appengine.MultiError
。
批量操作不是查询的替代或替代方案。例如,GetMulti
要求你已经准备好了要获取完整实体的所有键。因此,对于这些批量操作,没有游标的概念。
批量操作会返回所有请求的信息(或执行所有请求的操作)。没有实体或操作的顺序可以被终止和稍后继续。
查询和批量操作是用于不同的目的。你不必担心查询和游标的性能。它们表现得很好,而且重要的是,它们(数据存储)的扩展性很好。游标不会减慢查询的执行速度,带有游标的查询与不带游标的查询一样快,先前返回的实体也不会影响查询的执行时间:无论是在没有游标的查询还是在获取一百万个实体后获得游标后运行查询(这只有在多次迭代时才可能)。
英文:
Your approach is perfectly fine, in fact, it is the best way on AppEngine.
Querying subsequent entities by setting a start cursor will not give you duplicate results if a new record is inserted which would be the first for example.
Why? Because the cursor contains the key of the last returned entity encoded, and not the number of previously returned entities.
So if you set a cursor, the datastore will start listing and returning entities that come after the key encoded in the cursor. If a new entity is saved that comes after the cursor, then that entity will be returned when reached.
Also using the for
and append()
is the best way. You might optimize it a little by creating a big enough slice beforehand:
var is = make([]Item, 0, limit)
But note that I made it with 0
length and limit
capacity on purpose: there is no guarantee that there will be enough entities to fill the full slice.
Another optimization would be to allocate it to be limit
length:
var is = make([]Item, limit)
and when datastore.Done
is reached, reslice it if it is not filled fully, for example:
for idx := 0; ; idx++ {
var i Item
_, err := t.Next(&i)
if err == datastore.Done {
if idx < len(is) {
is = is[:idx] // Reslice as it is not filled fully
}
break
}
is[idx] = i
}
Batch operations
> GetMulti
, PutMulti
and DeleteMulti
are batch versions of the Get
, Put
and Delete
functions. They take a []*Key
instead of a *Key
, and may return an appengine.MultiError
when encountering partial failure.
Batch operations are not a replacement or alternative to queries. GetMulti
for example requires you to already have all the keys prepared for which you want to get the complete entities. And as such, there is no sense of a cursor for these batch operations.
Batch operations return you all the requested information (or do all the requested operation). There is no sequence of entities or operations which would/could be terminated and continued later on.
Queries and batch operations are for different things. You shouldn't worry about query and cursor performance. They do quite good, and what's important, they (the Datastore) scale good. A cursor will not slow the execution of a query, a query with a cursor will run just as fast as a query without a cursor, and also previously returned entities do not affect query execution time: it doesn't matter if you run a query without a cursor or with a cursor which you acquired after getting a million entities (which is only possible with several iterations).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论