多键索引,性能缺陷

huangapple go评论114阅读模式
英文:

Multikey indexing, performance drawbacks

问题

Consider a collection of approximately 1 mln. documents having the following schema:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling: {
        From: 20230202,
        To: 20230201,
        MagicValue: 12
    },
    Mandatory: true,
    Type: "None"
}

Could there be any possible drawback in creating a compound index which contains a multikey index value like so?

{ _id: 1, Title: 1, Scheduling.From: 1 }

The documentation cites limitations only when dealing with nested array documents and goes on explaining index bounds creation rules and so on... If you really think about it the dummy document above could really be expressed as:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling.From: 20230202,
    Scheduling.To: 20230201,
    Scheduling.MagicValue: 12,
    Mandatory: true,
    Type: "None"
}

And it wouldn't make that much of a difference in terms of how the indexing engine treats it, as the "Scheduling" inner document should act like a namespace. Therefore I'm asking here... would the latter document behave differently than the first one above in respect to the same compound index? Thanks 多键索引,性能缺陷

英文:

Consider a collection of approximately 1 mln. documents having the following schema:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling: {
        From: 20230202,
        To: 20230201,
        MagicValue: 12
    },
    Mandatory: true,
    Type: "None"
}

Could there be any possible drawback in creating a compound index which contains a multikey index value like so?

{ _id: 1, Title: 1, Scheduling.From: 1 }

The documentation cites limitations only when dealing with nested array documents and goes on explaining index bounds creation rules and so on... If you really think about it the dummy document above could really be expressed as:

{
    _id: ObjectId(),
    Title: "",
    Description: "",
    Scheduling.From: 20230202,
    Scheduling.To: 20230201,
    Scheduling.MagicValue: 12,
    Mandatory: true,
    Type: "None"
}

And it wouldn't make that much of a difference in terms of how the indexing engine treats it, as the "Scheduling" inner document should act like a namespace. Therefore I'm asking here... would the latter document behave differently than the first one above in respect to the same compound index? Thanks 多键索引,性能缺陷

答案1

得分: 2

我的评论中,你描述的情况 不会 导致索引成为多键索引(在MongoDB术语中)。多键索引是根据文档中的以下条件自动生成的,如文档中所述:

如果任何索引字段是数组,MongoDB会自动创建多键索引;您无需显式指定多键类型。

你的文档中没有字段,因此没有任何索引字段是数组。我们可以通过以下测试证明索引不是多键的:

test> db.foo.drop()
false
test> db.foo.insert({ _id: ObjectId(), Title: "", Description: "", Scheduling: { From: 20230202, To: 20230201, MagicValue: 12 }, Mandatory: true, Type: "None" })
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("649de6208c8616ab438dd397") }
}
test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
_id_1_Title_1_Scheduling.From_1
test> db.foo.find().hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan.inputStage
{
  stage: 'IXSCAN',
  keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
  indexName: '_id_1_Title_1_Scheduling.From_1',
  isMultiKey: false,
  multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
  isUnique: false,
  isSparse: false,
  isPartial: false,
  indexVersion: 2,
  direction: 'forward',
  indexBounds: {
    _id: [ '[MinKey, MaxKey]' ],
    Title: [ '[MinKey, MaxKey]' ],
    'Scheduling.From': [ '[MinKey, MaxKey]' ]
  }
}

最重要的是,输出报告中显示 isMultiKey: false

相比之下,这是一个多键的复合索引的报告:

> db.foo.drop()
false
test> db.foo.insert({_id:1, x: [1,2,3] })
{ acknowledged: true, insertedIds: { '0': 1 } }
test> db.foo.createIndex({_id:1, x:1})
_id_1_x_1
test> db.foo.find().hint("_id_1_x_1").explain().queryPlanner.winningPlan.inputStage
{
  stage: 'IXSCAN',
  keyPattern: { _id: 1, x: 1 },
  indexName: '_id_1_x_1',
  isMultiKey: true,
  multiKeyPaths: { _id: [], x: ['x'] },
  isUnique: false,
  isSparse: false,
  isPartial: false,
  indexVersion: 2,
  direction: 'forward',
  indexBounds: { _id: [ '[MinKey, MaxKey]' ], x: [ '[MinKey, MaxKey]' ] }
}

因此,在你的情况下,多键索引边界的考虑都不适用。

话虽如此,还有一件事需要注意。你的索引以 _id 字段为前缀,MongoDB中的 _id 字段从定义上就是唯一的。如果你打算在查询中使用这个索引(或查询),并且在 _id 字段上有相等条件,那么你将不得不强制数据库使用它。MongoDB的当前版本在看到这样的条件时,默认使用默认的 { _id: 1 } 索引。

通过恢复你期望的索引并运行 .explain() 一个相关的查询,我们可以看到:

test> db.foo.drop()
false
test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
_id_1_Title_1_Scheduling.From_1
test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  filter: {
    '$and': [
      { 'Scheduling.From': { '$eq': 789 } },
      { Title: { '$eq': 456 } }
    ]
  },
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { _id: 1 },
    indexName: '_id_',
    isMultiKey: false,
    multiKeyPaths: { _id: [] },
    isUnique: true,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { _id: [ '[123, 123]' ] }
  }
}

通过 .hint() 强制索引使用示范了索引可以在需要时使用(并且有效地使用):

test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
    indexName: '_id_1_Title_1_Scheduling.From_1',
    isMultiKey: false,
    multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: {
      _id: [ '[123, 123]' ],
      Title: [ '[456, 456]' ],
      'Scheduling.From': [ '[789, 789]' ]
    }
  }
}
英文:

Expanding on my comment, the situation that you described would not result in the index being multikey (in MongoDB terminology). Multikey indexes arise from the following conditions as noted in the documentation:

>MongoDB automatically creates a multikey index if any indexed field is an array; you do not need to explicitly specify the multikey type.

No fields in your document, and therefore none of the indexed fields, are arrays. We can prove that the index is not multikey via the following test:

test> db.foo.drop()
false
test> db.foo.insert({ _id: ObjectId(), Title: "", Description: "", Scheduling: { From: 20230202, To: 20230201, MagicValue: 12 }, Mandatory: true, Type: "None" })
{
  acknowledged: true,
  insertedIds: { '0': ObjectId("649de6208c8616ab438dd397") }
}
test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
_id_1_Title_1_Scheduling.From_1
test> db.foo.find().hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan.inputStage
{
  stage: 'IXSCAN',
  keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
  indexName: '_id_1_Title_1_Scheduling.From_1',
  isMultiKey: false,
  multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
  isUnique: false,
  isSparse: false,
  isPartial: false,
  indexVersion: 2,
  direction: 'forward',
  indexBounds: {
    _id: [ '[MinKey, MaxKey]' ],
    Title: [ '[MinKey, MaxKey]' ],
    'Scheduling.From': [ '[MinKey, MaxKey]' ]
  }
}

Most specifically, the output reports isMultiKey: false.

By contrast, here is what is reported for a compound index that is multikey:

> db.foo.drop()
false
test> db.foo.insert({_id:1, x: [1,2,3] })
{ acknowledged: true, insertedIds: { '0': 1 } }
test> db.foo.createIndex({_id:1, x:1})
_id_1_x_1
test> db.foo.find().hint("_id_1_x_1").explain().queryPlanner.winningPlan.inputStage
{
  stage: 'IXSCAN',
  keyPattern: { _id: 1, x: 1 },
  indexName: '_id_1_x_1',
  isMultiKey: true,
  multiKeyPaths: { _id: [], x: [ 'x' ] },
  isUnique: false,
  isSparse: false,
  isPartial: false,
  indexVersion: 2,
  direction: 'forward',
  indexBounds: { _id: [ '[MinKey, MaxKey]' ], x: [ '[MinKey, MaxKey]' ] }
}

So none of the considerations for multikey index bounds apply in your situation.

That said, there is one other thing to note. Your index is prefixed with the _id field which is unique by definition in MongoDB. If the query (or queries) that you intend to use this index for have an equality condition on the _id field then you will have to force the database to use it. Current versions of MongoDB will default to using the default { _id: 1 } index anytime they see such a condition.

We can see that by restoring your desired index and running .explain() a relevant query:

test> db.foo.drop()
false
test> db.foo.createIndex({ _id: 1, Title: 1, "Scheduling.From": 1 })
_id_1_Title_1_Scheduling.From_1
test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  filter: {
    '$and': [
      { 'Scheduling.From': { '$eq': 789 } },
      { Title: { '$eq': 456 } }
    ]
  },
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { _id: 1 },
    indexName: '_id_',
    isMultiKey: false,
    multiKeyPaths: { _id: [] },
    isUnique: true,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: { _id: [ '[123, 123]' ] }
  }
}

Forcing index usage via .hint() demonstrates that the index can be used (and used efficiently) if needed:

 test> db.foo.find({_id: 123, Title: 456, "Scheduling.From": 789}).hint("_id_1_Title_1_Scheduling.From_1").explain().queryPlanner.winningPlan
{
  stage: 'FETCH',
  inputStage: {
    stage: 'IXSCAN',
    keyPattern: { _id: 1, Title: 1, 'Scheduling.From': 1 },
    indexName: '_id_1_Title_1_Scheduling.From_1',
    isMultiKey: false,
    multiKeyPaths: { _id: [], Title: [], 'Scheduling.From': [] },
    isUnique: false,
    isSparse: false,
    isPartial: false,
    indexVersion: 2,
    direction: 'forward',
    indexBounds: {
      _id: [ '[123, 123]' ],
      Title: [ '[456, 456]' ],
      'Scheduling.From': [ '[789, 789]' ]
    }
  }
}

huangapple
  • 本文由 发表于 2023年6月30日 00:06:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76582798.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定