MongoDB在根替换后使用索引吗?

huangapple go评论120阅读模式
英文:

Does MongoDB use indexes after root replacement?

问题

我使用以下的聚合管道:

db.myCollection.aggregate([
    {
        "$match": {
            "x": {
                "$lte": 100
            }
        }
    }, {
        "$sort": {
            "x": -1
        }
    }, {
        "$group": {
            "_id": {
                "y": "$y",
                "z": "$z"
            },
            "r": {
                "$first": "$$CURRENT"
            }
        }
    }, {
        "$replaceRoot": {
            "newRoot": "$r"
        }
    }, {
        "$match": {
            "a": true,
            "b": "value"
        }
    }
]);

显然,通过添加以下索引可以提高查询性能:

db.myCollection.createIndex({ x: -1 });

文档指出$group阶段不会使用索引,因为它根据另一个字段进行分组,然后是$match$sort阶段。我是对的吗?还是可以将xy字段添加到第一个索引以使用它?

而且,请帮助我理解:在$replaceRoot阶段之后,MongoDB是否可以使用索引?如果我添加以下索引,是否会提高性能?

db.myCollection.createIndex({ a: 1, b: 1 });
英文:

I use following aggregation pipline:

db.myCollection.aggregate([
    {
        "$match": {
            "x": {
                "$lte": 100
            }
        }
    }, {
        "$sort": {
            "x": -1
        }
    }, {
        "$group": {
            "_id": {
                "y": "$y",
                "z": "$z"
            },
            "r": {
                "$first": "$$CURRENT"
            }
        }
    }, {
        "$replaceRoot": {
            "newRoot": "$r"
        }
    }, {
        "$match": {
            "a": true,
            "b": "value"
        }
    }
]);

Obviously, it's possible to improve performance of the query by adding following index:

db.myCollection.createIndex({ x: -1 });

The documentation says the $group stage will not use indexes, because it groups by another field then $match and $sort stages. Am I right, or is it possible to add x and y fields to the first index to use one?

And, please, help me to understand: can the MongoDB use indexes after $replaceRoot stage? Will it improve performance if I add the following index?

db.myCollection.createIndex({ a: 1, b: 1 });

答案1

得分: 4

这里有两个不同的问题。这个回答将主要关注标题中的问题,即关于在$replaceRoot之后的$match阶段中的索引使用。

对于这个问题的答案是:在你的特定情况下,索引不能帮助你。

请记住,索引用于有效地查找集合中的文档。但在这里经过$group阶段之后(因此也包括$replaceRoot),您不再拥有原始文档。相反,您有了一些新的合成内容。数据库将在此之后操作这些新项目,因此通常没有任何好处(甚至能力)返回到集合以再次扫描源文档。

$match断言是在分组期间使用的字段上的情况下,可能存在一种特殊情况。但在这种情况下也不适用。您在评论中确认您不能手动将后续的$match阶段提前,因为这会破坏业务逻辑。这也是数据库无法执行的原因,因为不允许执行会逻辑上改变客户端请求的结果的任何优化。

至于您关于$group本身的索引使用的另一个问题,据我上次检查,您链接的优化不适用于您示例中的复合值分组。您可以尝试在所有三个字段上创建一个复合索引({ x: 1, y: 1, z: 1 }),并查看.explain()输出是否显示了您正在使用的版本上的DISTINCT_SCAN计划。即使如此,根据yz的唯一值有多少,它可能不一定更快。

英文:

You've asked two different questions here. This answer will focus mostly on the question in the title which is about index usage in a $match stage that follows a $replaceRoot.

The answer to that question is: no, an index cannot help here in your particular case.

Keep in mind that indexes are used to efficiently find documents in the collection. But after the $group stage here (and therefore the $replaceRoot as well) you no longer have the original documents. Instead you have synthesized something new. The database is going to operate on those new items after that point, therefore there would generally be no benefit (or even ability) to go back to the collection to begin scanning source documents once again.

There is potentially a special case when the $match predicates are on fields used during the grouping. However that does not apply in this case either. You confirmed in the comments that you cannot manually pull the trailing $match stage earlier as it would break the business logic. This is the same reason that the database would be unable to do so as it is not allowed to perform any optimization that would logically change the results requested by the client.

As for you other question about index usage for $group itself, last I checked the optimization you linked was not applicable for grouping on compound values as you have in your example. You could try creating a compound index on all three fields ({ x: 1, y: 1, z: 1 }) and see if the .explain() output shows a plan with a DISTINCT_SCAN on the version that you are using. Even if so it might not necessarily be faster depending on how many unique values there are for y and z.

huangapple
  • 本文由 发表于 2023年3月9日 20:32:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684665.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定