英文:
DocumentDB query is using the wrong index
问题
I'm using MongoDB version 4.0.0 with DocumentDB.
I have a collection called transactions
. That collection has these two indices:
{
"creation_date": -1
}
{
"project_id": 1,
"mid.id": 1
}
I'm running the following query:
db.transactions.explain('executionStats').find({
"project_id": "1",
"mid.id": {
$in: [/* array with 47 ids */]
}
}).sort({creation_date: -1}).skip(0).limit(50)
The execution stats for that code are as follows:
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "myDB.transactions",
"winningPlan": {
"stage": "SUBSCAN",
"inputStage": {
"stage": "LIMIT_SKIP",
"inputStage": {
"stage": "IXSCAN",
"indexName": "creation_date_idx",
"direction": "forward"
}
}
}
},
"executionStats": {
"executionSuccess": true,
"executionTimeMillis": "445731.524",
"planningTimeMillis": "1.487",
"executionStages": {
"stage": "SUBSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.856",
"inputStage": {
"stage": "LIMIT_SKIP",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.787",
"inputStage": {
"stage": "IXSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.765",
"indexName": "creation_date_idx",
"direction": "forward"
}
}
}
},
"serverInfo": {
"host": "127.0.0.1",
"port": 27017,
"version": "4.0.0"
},
"ok": 1,
"operationTime": {
"$timestamp": "7231419170506670081"
}
}
It chose the creation_date
index and took 445 seconds to perform the query!
If I try to hint the query to force using the project_id
and mid.id
index, here is the winning plan:
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "myDB.transactions",
"winningPlan": {
"stage": "SUBSCAN",
"inputStage": {
"stage": "LIMIT_SKIP",
"inputStage": {
"stage": "SORT",
"sortPattern": {
"creation_date": -1
},
"inputStage": {
"stage": "IXSCAN",
"indexName": "project_id_mid.id_idx",
"direction": "forward"
}
}
}
}
},
"executionStats": {
"executionSuccess": true,
"executionTimeMillis": "10.966",
"planningTimeMillis": "1.615",
"executionStages": {
"stage": "SUBSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "9.310",
"inputStage": {
"stage": "LIMIT_SKIP",
"nReturned": "44",
"executionTimeMillisEstimate": "9.292",
"inputStage": {
"stage": "SORT",
"nReturned": "44",
"executionTimeMillisEstimate": "9.287",
"sortPattern": {
"creation_date": -1
},
"inputStage": {
"stage": "IXSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "9.208",
"indexName": "project_id_mid.id_idx",
"direction": "forward"
}
}
}
}
},
"serverInfo": {
"host": "127.0.0.1",
"port": 27017,
"version": "4.0.0"
},
"ok": 1,
"operationTime": {
"$timestamp": "7231419990845423617"
}
}
It takes 10 milliseconds to perform the query. So... why is DocumentDB choosing the creation_date
index by default? Is something wrong with my query?
Some other things I tried:
- If I run the query with MongoDB Compass pointing to DocumentDB, the query works fine without me having to hint anything.
- If I reduce the size of the
mid.id
array to 14 items, it works
英文:
I'm using MongoDB version 4.0.0 with DocumentDB.
I have a collection called transactions
. That collection has this two indices:
{
"creation_date": -1
}
{
"project_id": 1,
"mid.id": 1
}
I'm running the following query:
db.transactions.explain('executionStats').find({
"project_id": "1",
"mid.id": {
$in: [/* array with 47 ids */]
}
}).sort({creation_date: -1}).skip(0).limit(50)
The execution stats for that code are the following:
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "myDB.transactions",
"winningPlan": {
"stage": "SUBSCAN",
"inputStage": {
"stage": "LIMIT_SKIP",
"inputStage": {
"stage": "IXSCAN",
"indexName": "creation_date_idx",
"direction": "forward"
}
}
}
},
"executionStats": {
"executionSuccess": true,
"executionTimeMillis": "445731.524",
"planningTimeMillis": "1.487",
"executionStages": {
"stage": "SUBSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.856",
"inputStage": {
"stage": "LIMIT_SKIP",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.787",
"inputStage": {
"stage": "IXSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "445729.765",
"indexName": "creation_date_idx",
"direction": "forward"
}
}
}
},
"serverInfo": {
"host": "127.0.0.1",
"port": 27017,
"version": "4.0.0"
},
"ok": 1,
"operationTime": {
"$timestamp": "7231419170506670081"
}
}
It chose the creation_date
index and took 445 seconds to perform the query!
If I try, instead, to hint the query to force using the project_id
and mid.id
index, here is the winning plan:
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "myDB.transactions",
"winningPlan": {
"stage": "SUBSCAN",
"inputStage": {
"stage": "LIMIT_SKIP",
"inputStage": {
"stage": "SORT",
"sortPattern": {
"creation_date": -1
},
"inputStage": {
"stage": "IXSCAN",
"indexName": "project_id_mid.id_idx",
"direction": "forward"
}
}
}
}
},
"executionStats": {
"executionSuccess": true,
"executionTimeMillis": "10.966",
"planningTimeMillis": "1.615",
"executionStages": {
"stage": "SUBSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "9.310",
"inputStage": {
"stage": "LIMIT_SKIP",
"nReturned": "44",
"executionTimeMillisEstimate": "9.292",
"inputStage": {
"stage": "SORT",
"nReturned": "44",
"executionTimeMillisEstimate": "9.287",
"sortPattern": {
"creation_date": -1
},
"inputStage": {
"stage": "IXSCAN",
"nReturned": "44",
"executionTimeMillisEstimate": "9.208",
"indexName": "project_id_mid.id_idx",
"direction": "forward"
}
}
}
}
},
"serverInfo": {
"host": "127.0.0.1",
"port": 27017,
"version": "4.0.0"
},
"ok": 1,
"operationTime": {
"$timestamp": "7231419990845423617"
}
}
It takes 10 milliseconds to perform the query. So... why is DocumentDB choosing the creation_date
index by default? Is something wrong with my query?
Some other things I tried:
- If I run the query with MongoDB Compass pointing to DocumentDB, the query works fine without me having to hint anything.
- If I reduce the size of the
mid.id
array to 14 items, it works
答案1
得分: 2
你迫使查询规划器在选择最小化检查文档数量的索引和避免内存排序的索引之间做出选择。在测试计划时,哪个性能更好取决于索引开头附近有最多结果的索引以及避免内存排序的索引会因减少内存使用而获得奖励。
如果您创建一个既可以执行这两项任务的索引,就像这样:
{
"project_id": 1,
"mid.id": 1,
"creation_date": -1
}
那么它应该在每次比赛中获胜,毫不费力。
英文:
You are forcing the query planner to choose between an index that minimized the number of documents examined, and an index that avoids an in-memory sort. Which performs better when testing the plans will depend on things like which index has the most results near the beginning of the index, and the one that avoids the in-memory sort gets a bonus for reduced memory usage.
If you create an index that will do both, like:
{
"project_id": 1,
"mid.id": 1,
"creation_date": -1
}
It should win, hands down, every time.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论