问题

I have two MongoDB collections: players and stats.

players collection have some data I am interested in, and a field named username which is used to reference the stats collection through a field on stats called username.

I have the following aggregate pipeline to get the required data from players collection that have data on stats collection, as well as few other conditions.

The problem I am facing is that this does not work on big datasets. When I ran this aggregate pipeline on a dataset of 54K documents in the players collection and 15k in the stats collection, it takes around 10 minutes to execute which is far from ideal.

What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?

Things I have also considered, I have tried to do a reverse lookup (from stats collection and lookup on players), but that did not help. I also tried several pipeline stage modifications (project, pipeline within the match stage, grouping, count) but nothing helped.

EDIT

I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?

英文:

I have two MongoDB collections: players and stats.

players collection have some data I am interested in, and a field named username which is used to reference the stats collection through a field on stats called username.

I have the following aggregate pipeline to get the required data from players collection that have data on stats collection, as well as few other conditions.

[{
        $match: {
            &quot;username&quot;: {
                $exists: true, $ne: null, $not: {
                    $regex: &quot;^\\s*$&quot;
                }
            }
        }
    }, {
        $lookup: {
            from: &quot;stats&quot;,
            localField: &quot;username&quot;,
            foreignField: &quot;username&quot;,
            as: &quot;stats&quot;
        }
    }, {
        $match: {
            stats: {
                $ne: []
            }
        }
    }, {
        $count: &quot;count&quot;
    }]

What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?

EDIT

I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?

答案1

得分: 1

尝试对您的集合进行索引，并创建频繁查询字段的索引。

英文:

Try indexing your collection and try to create indexes of the frequently queried fields

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MongoDB聚合管道花费10分钟

问题

答案1

如何防止将 mgo 解组为 float64 类型的 int 值

如何在Java代码中使用MongoTemplate的条件（cond）和筛选（filter）功能？

Sure, here’s the translation: AWS Lambda Java函数用于更新MongoDB中的查询

在数组中查找动态对象键中对象值的MongoDB字符串匹配

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论