英文:
Golang on App Engine Datastore - Using PutMulti to Improve Performance
问题
我有一个使用GAE Golang应用程序,应该能够处理数百个并发请求,对于每个请求,我会对输入进行一些处理,然后将其存储在数据存储中。
使用任务队列(appengine/delay库),我得到了相当不错的性能,但是似乎仍然对于每个请求执行单行插入非常低效(即使使用任务队列进行延迟插入)。
如果这不是app engine,我可能会将输出追加到文件中,然后定期使用cron作业/其他类型的定时服务将文件批量加载到数据库中。
所以我的问题是:
- 在app engine上是否有类似的方案可以实现?我在考虑-也许我应该将一些行写入memcache,然后每隔几秒钟从那里批量加载所有行并清除缓存。
- 这真的有必要吗?数据存储能够处理数千个并发写入吗-每个HTTP请求都进行一次写入。
英文:
I have a GAE Golang app that should be able to handle hundreds of concurrent requests, and for each requests, I do some work on the input and then store it in the datastore.
Using the task queue (appengine/delay lib) I am getting pretty good performance, but it still seems very inefficient to perform single-row inserts for each request (even though the inserts are deferred using task queue).
If this was not app engine, I would probably append the output a file, and every once in a while I would batch load the file into the DB using a cron job / some other kind of scheduled service.
So my questions are:
- Is there an equivalent scheme I can implement on app engine? I was
thinking - perhaps I should write some of the rows to memecache, and
then every couple of seconds I will bulk load all of the rows from
there and purge the cache. - Is this really needed? Can the datastore
handle thousands of concurrent writes - a write per http request my
app is getting?
答案1
得分: 1
这真的取决于你的设置。你是否在使用祖先查询?如果是的话,那么你每秒只能对每个祖先(以及所有子代、孙代)进行1次写入。数据存储有一个自然的队列,所以如果你尝试写入得太快,它会将其排队。只有当你写入得太多太快时,才会成为一个问题。你可以在这里阅读一些最佳实践。
如果你认为会超过这个限制,可以使用拉取队列和异步多次写入。你可以将每个实体放入队列中。使用一个后台模块(10分钟超时),你可以批量(10-50-100...)拉取条目,并批量进行put_async操作。它会以适当的速度进行写入。在它工作的同时,你可以排队下一批。只需注意超时问题。
英文:
Depends really on your setup. Are you using ancestor queries? If so then your are limited to 1 write per second PER ancestor (and all children, grand children). The datastore has a natural queue so if you try and write too quickly it will queue it. It only becomes an issue if you are writing too many way too quickly. You can read some best practices here.
If you think you will be going over that limit use a pull queues with async multi puts. You would put each entity in the queue. With a backed module (10 minute timeouts) you can pull in the entries in batches (10-50-100...) and do a put_async on them in batches. It will handle putting them in at the proper speed. While its working you can queue up the next batch. Just be wary of the timeout.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论