英文:
How to identify Mongo Atlas performance issues?
问题
我不知道如何在一个“高流量项目”中识别Mongo性能问题。
这是一个MERN堆栈项目,其中Mongo在AWS上部署,使用MongoDB Atlas云。所有查询都由在启用自动扩展的性能L-dyno上运行的Heroku服务器运行。
有2个数据库,大约有80个集合。1个集合约有130,000条记录,4-5个集合约有40,000-80,000条记录,其他集合的记录少于5,000条。我尝试测试一个生成约110个Mongo查询的用户流程。我根据DataDog跟踪的所有mongo spans估计了约110个查询。在生产/开发环境中,所有这些Mongo spans的持续时间在50毫秒至500毫秒之间。我在JMeter中创建了一个测试套件,在其中使用500个虚拟用户测试此流程,其中ramp-up周期为60秒。当我运行此测试时,mongo spans持续时间极长,超过30秒,导致服务器的请求超时错误。
我尝试将Mongo Atlas环境升级到M200(我尝试了通用选项和本地NVMe SSD),也尝试了M300。没有帮助,Mongo的span持续时间太长。当测试正在运行时,我没有注意到Mongo Atlas ->实时监视器中出现任何峰值。CPU和磁盘利用率都低于5%。当我运行测试并看到它失败时,我停止测试并在DataDog中检查跟踪,其中DataDog中没有超过1000个Mongo spans(查询)。
当我打开Mongo Atlas分析视图时,我可以看到测试运行时查询执行时间稍慢,但大多数查询都丢失了。您知道为什么分析视图会丢失一些查询,而且不会显示在DataDog中可以看到的超过30秒的慢查询吗?
强大的M200/M300环境为什么不能在一分钟内处理小于50,000个查询和小于150,000条记录的集合?
您有任何想法如何确定Mongo服务器的问题所在吗?这里和这里是在M200配置上运行测试时看到的一些峰值的截图。
性能顾问中有3个建议向3个集合添加索引。您认为这可能是Mongo服务器运行缓慢的原因吗?
英文:
I don't know how to identify Mongo performance issues in a "high traffic project".
It's a MERN stack project where Mongo is deployed on AWS by using MongoDB Atlas cloud. All queries are run by Heroku server that is running on Performance L-dyno with enabled auto scaling.
There are 2 databases with around 80 collection. 1 collection has around 130k records, 4-5 collections has around 40k-80k records and others have less than 5k records. I tried to test a user flow that generates ~110 mongo queries. I estimate ~110 queries based on all mongo spans tracked by DataDog. All these mongo spans have duration between 50ms-500ms in production/development environment. I created a test suite in JMeter where I test this flow with 500 virtual users where ramp-up period is 60s. When I run this test, mongo spans have extremely long duration >30s and they cause request timeout errors on the server.
I tried to upgrade the Mongo Atlas environment to M200 (I tried both General option and Local NVMe SSD) and I tried M300 as well. It didn't help, mongo's spans duration is too long. When the test was running, I didn't notice any spikes in Mongo Atlas -> Real Time monitor. CPU with Disk Util were under 5%. When I run the test and I see it's failing, I stop the test and check traces in DataDog, there is not more 1000 mongo spans(queries) in DataDog.
When I open Mongo Atlas Profiling View, I can see that queries execution time is a bit slower when test is running, but most of them are missing. Do you know why profiling view is missing some queries and doesn't show slow queries >30s I can see in DataDog?
How is it possible that such as a strong environment M200/M300 is not able to process <50k queries with collections <150k records within one minute?
Do you have any idea how I can identify what's the issue with Mongo server? Here and here are screenshots from Metrics view where you can see some spikes when tests were running on M200 configuration.
There are 3 recommendations in performance advisor to add an index to 3 collections. Do you think this can be the issue why Mongo server is so slow?
答案1
得分: 0
只需找出问题出在Heroku服务器上。即使您使用性能动力和自动缩放,Heroku服务器也无法处理大量请求。您只需使用更多的动力。Heroku服务器在负载测试上被阻塞。我只是不明白为什么DataDog显示长时间的mongo跨度,如果mongo查询没有执行。当我检查mongo日志时,查询丢失了。
英文:
Just find out the issue is with the Heroku server. Even if you use autoscaling with performance dynos, Heroku server is not able to process a lot of requests. You just need to use more dynos. The Heroku server was blocked on load tests. I just don't understand why DataDog shows long mongo spans if mongo queries weren't executed. When I checked mongo logs, the queries are missing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论