Google App Engine,运行重型数据存储写入定时任务的最经济方式是什么?

huangapple go评论82阅读模式
英文:

Google appengine, least expensive way to run heavy datastore write cron job?

问题

我有一个使用Go编写的Google App Engine应用程序,其中有一个cron进程,每天凌晨3点运行一次。该进程会查看当天对我的数据所做的所有更改,并存储一些关于发生了什么的元数据。我的用户可以根据这些元数据运行报告,以查看在几个月内发生的趋势。每天晚上,该进程会执行大约1000-2000万次数据存储写入操作。一切都正常工作,但自从我开始运行它以来,我注意到我的Google月度账单显著增加(从每月约50美元增加到约400美元)。

我刚刚设置了一个非常基本的任务队列来运行这个进程,我没有改变默认设置。有没有更好的方法可以在晚上运行这个进程以节省费用?我从未对后端(现已弃用)或模块API进行过任何更改,而且我知道他们最近对这些内容进行了很多更改,所以我不确定从哪里开始寻找。非常感谢任何建议。

英文:

I have a Google appengine application, written in Go, that has a cron process which runs once a day at 3am. This process looks at all of the changes that have happened to my data during the day and stores some meta data about what happened. My users can run reports on this meta data to see trends that have happened over several months. The process does around 10-20 million datastore writes every night. It all works just fine, but since I have started running it I have noticed a significant increase in my monthly bill from Google (from around $50/month to around $400/month).

I have just setup a very basic taskqueue that this runs in, I have not changed the default settings at all. Is there a better way that I could be running this process at night that could save me money? I have never messed around with the backends (which are now depreciated) or the modules api, and I know they've changed a lot of this stuff recently so I'm not sure where to start looking. Any advice would be greatly appreciated.

答案1

得分: 1

请查看您在凌晨3点的实例情况。可能是GAE启动了很多实例来处理作业。您可以配置作业使其并行运行较少,这样可能会花费更长时间,但也许只需要一个实例。

然而,如果数据库写入确实是最大的因素,这样做不会产生很大影响。

您可以尝试查看数据模型和索引。请记住,每个索引字段会额外增加2次写入成本,因此如果不需要,可以尝试从某些字段中删除索引。

英文:

Look at your instances at 3am. It might be that GAE spins up a lot of them to handle the job. You could configure your job to make it run less paralel so it will take longer but perhaps it will need only 1 instance then.
However, if your database writes are indeed the biggest factor this won't make a big impact.

You can try looking at your data models and indexes. Remember that each indexed field costs 2 writes extra, so see if you can remove indexes from some fields if you don't need them.

答案2

得分: 1

你可以进行的一项改进是批量处理写操作,可以使用memcache来实现(选择付费的专用memcache,因为它更可靠)。将更新写入到memcache中,当数据量达到约900K时,再将其刷新到数据存储中。这将大大减少对数据存储的写入次数,尤其是当元数据的大小较小时。

英文:

One improvement that you can do is to batch your write operations, you can use memcache for this (pay the dedicated one since it's more reliable). Write the updates to memcache, once it's about 900K, flush it to datastore. This will reduces the number of write to datastore A LOT, especially if your metadata's size is small.

huangapple
  • 本文由 发表于 2015年1月23日 01:44:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/28095329.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定