MongoDB聚合管道花费10分钟

huangapple go评论52阅读模式
英文:

MongoDB Aggregate pipeline taking 10 minutes

问题

I have two MongoDB collections: players and stats.

players collection have some data I am interested in, and a field named username which is used to reference the stats collection through a field on stats called username.

I have the following aggregate pipeline to get the required data from players collection that have data on stats collection, as well as few other conditions.

The problem I am facing is that this does not work on big datasets. When I ran this aggregate pipeline on a dataset of 54K documents in the players collection and 15k in the stats collection, it takes around 10 minutes to execute which is far from ideal.

What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?

Things I have also considered, I have tried to do a reverse lookup (from stats collection and lookup on players), but that did not help. I also tried several pipeline stage modifications (project, pipeline within the match stage, grouping, count) but nothing helped.

EDIT

I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?

英文:

I have two MongoDB collections: players and stats.

players collection have some data I am interested in, and a field named username which is used to reference the stats collection through a field on stats called username.

I have the following aggregate pipeline to get the required data from players collection that have data on stats collection, as well as few other conditions.

[{
        $match: {
            "username": {
                $exists: true, $ne: null, $not: {
                    $regex: "^\\s*$"
                }
            }
        }
    }, {
        $lookup: {
            from: "stats",
            localField: "username",
            foreignField: "username",
            as: "stats"
        }
    }, {
        $match: {
            stats: {
                $ne: []
            }
        }
    }, {
        $count: "count"
    }]

The problem I am facing is that this does not work on big datasets. When I ran this aggregate pipeline on a dataset of 54K documents in the players collection and 15k in the stats collection, it takes around 10 minutes to execute which is far from ideal.

What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?

Things I have also considered, I have tried to do a reverse lookup (from stats collection and lookup on players), but that did not help. I also tried several pipeline stage modifications (project, pipeline within the match stage, grouping, count) but nothing helped.

EDIT

I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?

答案1

得分: 1

尝试对您的集合进行索引,并创建频繁查询字段的索引。

英文:

Try indexing your collection and try to create indexes of the frequently queried fields

huangapple
  • 本文由 发表于 2023年4月1日 00:34:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75900833.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定