App Engine (Flask) memory limit: how should I cache "large" (3 MB) database calls? How can I monitor memory usage on a local server or during testing?

huangapple go评论89阅读模式
英文:

App Engine (Flask) memory limit: how should I cache "large" (3 MB) database calls? How can I monitor memory usage on a local server or during testing?

问题

我最近在我的当前项目golfcourse.wiki(App Engine F1,Python 3.9,Flask 2.0.2)上遇到了这个错误:

已超出硬内存限制,达到395 MiB,共处理了39个请求。考虑在app.yaml中设置更大的实例类。

随着我的数据库不断增长,这个问题变得越来越严重。我曾认为这可能是数据库调用的问题(现在我明显理解错了),所以我添加了内置的memcache到系统中。这带来了很大的麻烦,因为添加了一个wsgi包装器后,所有我的测试都崩溃了,我需要使用应用工厂模式重写__init__.py代码,然后逐个重写应用调用为current_app调用(我还没有完成这个步骤)。

不幸的是,这个过程实际上增加了应用崩溃的次数:

已超出硬内存限制,达到424 MiB,共处理了539个请求。考虑在app.yaml中设置更大的实例类。

在做这些操作时,我意识到我的一些较大的数据库调用(3 MB)超出了memcache的限制,所以我编写了常见的序列化解决方法来处理它,结果内存爆炸了:

已超出硬内存限制,达到904 MiB,共处理了0个请求。考虑在app.yaml中设置更大的实例类。

我想我应该升级到F2...当它导致内存使用量*增加了20%*时,我简直不敢相信:

已超出硬内存限制,达到1105 MiB,共处理了0个请求。考虑在app.yaml中设置更大的实例类。

我感到非常困惑。我在App Engine文档中找不到任何与此相关的信息(我确实仔细查看了)。我承认在这方面我是个业余,尽管我已经使用Python十多年了。

有人能解释给我:

  1. 这里实际上发生了什么?我理解如果不分享整个代码库,没有人能指出效率低下的地方,但我现在假设我遇到了RSS限制?因为当我在本地运行进程时,我根本没有接近那个限制!据我所知,当程序在本地运行时,可能达到150MiB左右。
  2. 我如何在测试中监控这个问题?
  3. 我应该担心我所做的数据库调用次数吗?虽然我在极限条件下运行(显然是因为我在F1上),但我正在支付MongoDB,我认为我还没有接近配额。看起来我可以很好地缓存大部分查询,但较大的查询发生在主页上。

我已经试图解决这个问题数月,任何帮助都将不胜感激。

英文:

For some time now, I've been running into this error on my current project, golfcourse.wiki (App Engine F1, Python 3.9, Flask 2.0.2):

Exceeded hard memory limit of 384 MiB with 395 MiB after servicing 39 requests total. Consider setting a larger instance class in app.yaml.

This has been an issue more and more as my database has been growing. I thought it might be an issue of database calls (which I clearly understand is wrong now), so I added the builtin memcache to the system. This has been a huge pain, as adding a wsgi wrapper has make all my tests break, and I'm going to need to rewrite the __init__.py code using the app factory pattern and then go through the entire code rewriting app calls as current_app calls (haven't gotten around to that yet).

Unfortunately, the process of this has actually increased the number of times the app crashes:

Exceeded hard memory limit of 384 MiB with 424 MiB after servicing 539 requests total. Consider setting a larger instance class in app.yaml.

While doing this, I realized some of my larger database calls (3 MB) were over the memcache limit, and so I wrote the common serialization workaround to deal with it and my memory exploded:

Exceeded hard memory limit of 384 MiB with 904 MiB after servicing 0 requests total. Consider setting a larger instance class in app.yaml.

I figured I'd bite the bullet and upgrade to F2... I couldn't believe it when it caused the memory usage to go up by 20%:

Exceeded hard memory limit of 768 MiB with 1105 MiB after servicing 0 requests total. Consider setting a larger instance class in app.yaml.

I'm deeply confused. I cannot find anything related to this in the App Engine docs (I've really looked). I'm admittedly an amateur at this stuff, even as I've been working with python for over a decade.

Can someone explain to me:

  1. What is actually happening here? I understand that without sharing my entire code base, that nobody can point out the inefficiencies, but I'm assuming now that I'm running up against the RSS limit? Because when I run the process locally, I'm not coming anywhere close to that! As far as I can tell the program, while running locally, is maybe reaching 150MiB.
  2. How can I monitor this with testing?
  3. Should I even worry about the number of database calls I'm making? I'm operating on a shoestring (obviously, being on F1), but I'm paying for MongoDB, and I don't think I'm anywhere near the quotas yet. It looks like I can cache most of the queries just fine, but the larger ones happen on the home page.

I've been trying to figure this out for months, any help would be really appreciated.

答案1

得分: 1

我理解这个错误是指你的实例内存不足,所以添加缓存不一定有帮助。缓存会节省你访问数据库的往返时间,但不会减少加载到内存中的数据量,然后返回给你的网站结果。

随着我的数据库越来越大,这个问题变得越来越严重。

我访问了你的网站。看起来问题在于你在返回主页时获取了整个数据库的高尔夫球场列表。你需要:

  1. 只获取你在特定页面加载时需要的高尔夫球场(即用户位置附近100英里内的高尔夫球场)。然后,当用户在地图上移动时,查询后端以获取新位置的高尔夫球场。
  2. 生成一个包含所有高尔夫球场的已发布列表,存储在GCS存储桶中,并让你的网站从该存储桶中读取文件。

还有其他解决这个问题的方法,但这些是最简单的方法。每次主页都将整个数据库表加载到内存中并不是一个好主意。

英文:

I understand that error to mean that your instance ran out of RAM, so adding a cache wouldn't necessarily help. A cache will save you the round-trip to the db, not how much you're loading into memory before return a result to your website.

> This has been an issue more and more as my database has been growing.

I visited your website. It looks like the problem is you are fetching your entire DB's list of golf courses to return the home page. You need to either:

  1. only fetch the golf courses you need for that particular page load (i.e. only the golf courses within 100 miles of the user's location). Then when you user moves on the map, you query your backend for the golf courses at the new location
  2. produce like a published list of all the golf courses, store it in a GCS bucket and have your website read that file from that bucket.

There are other solutions to this problem too, but those are the simplest. Pulling your entire DB table into RAM everytime the home page is just not a good idea.

huangapple
  • 本文由 发表于 2023年7月20日 09:31:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726139.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定