英文:
MongoDB Aggregate pipeline taking 10 minutes
问题
I have two MongoDB collections: players
and stats
.
players
collection have some data I am interested in, and a field named username
which is used to reference the stats
collection through a field on stats
called username
.
I have the following aggregate pipeline to get the required data from players
collection that have data on stats
collection, as well as few other conditions.
The problem I am facing is that this does not work on big datasets. When I ran this aggregate pipeline on a dataset of 54K documents in the players
collection and 15k in the stats
collection, it takes around 10 minutes to execute which is far from ideal.
What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?
Things I have also considered, I have tried to do a reverse lookup (from stats
collection and lookup on players
), but that did not help. I also tried several pipeline stage modifications (project, pipeline within the match stage, grouping, count) but nothing helped.
EDIT
I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?
英文:
I have two MongoDB collections: players
and stats
.
players
collection have some data I am interested in, and a field named username
which is used to reference the stats
collection through a field on stats
called username
.
I have the following aggregate pipeline to get the required data from players
collection that have data on stats
collection, as well as few other conditions.
[{
$match: {
"username": {
$exists: true, $ne: null, $not: {
$regex: "^\\s*$"
}
}
}
}, {
$lookup: {
from: "stats",
localField: "username",
foreignField: "username",
as: "stats"
}
}, {
$match: {
stats: {
$ne: []
}
}
}, {
$count: "count"
}]
The problem I am facing is that this does not work on big datasets. When I ran this aggregate pipeline on a dataset of 54K documents in the players
collection and 15k in the stats
collection, it takes around 10 minutes to execute which is far from ideal.
What would be a possible enhancements to achieve this task on bigger datasets without taking such a long time?
Things I have also considered, I have tried to do a reverse lookup (from stats
collection and lookup on players
), but that did not help. I also tried several pipeline stage modifications (project, pipeline within the match stage, grouping, count) but nothing helped.
EDIT
I have tried indexing fields "username" on both collections and that helped to get it down from 10 minutes to 20-25 seconds, but I would like to bring it down more, what other suggestions would be?
答案1
得分: 1
尝试对您的集合进行索引,并创建频繁查询字段的索引。
英文:
Try indexing your collection and try to create indexes of the frequently queried fields
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论