相同的ID具有不同的_routing值。

huangapple go评论91阅读模式
英文:

Same IDs with different _routing values

问题

根据 Elasticsearch 的文档,可以将具有相同 _id 的文档与不同的 _routing 值索引,因此文档指出 _id 的唯一性不被保证,因为这些文档可能会分布在不同的分片上(这似乎是一种特性而不是错误)。

那么当具有相同 _id 的两个文档以不同的 routing 值索引并且最终位于同一分片上的情况如何呢?考虑下面的查询体:

  1. PUT test-index
  2. {
  3. "settings": {
  4. "index": {
  5. "number_of_shards": 2
  6. }
  7. }
  8. }
  9. PUT test-index/_doc/1?routing=user1
  10. {
  11. "title": "这是具有 routing=user1 的文档编号"
  12. }
  13. PUT test-index/_doc/1?routing=user2
  14. {
  15. "title": "这是具有 routing=user2 的文档编号"
  16. }
  17. GET test-index/_search

搜索查询呈现以下结果:

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 2,
  6. "successful": 2,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 1,
  16. "hits": [
  17. {
  18. "_index": "test-index",
  19. "_id": "1",
  20. "_score": 1,
  21. "_routing": "user2",
  22. "_source": {
  23. "title": "这是具有 routing=user2 的文档编号"
  24. }
  25. }
  26. ]
  27. }
  28. }

为什么搜索响应只显示了具有 user2 的文档,尽管有 2 个分片?这是因为根据以下公式确定分片编号:

  1. shard_num = (hash(_routing) % num_routing_shards) / routing_factor
  2. 其中 routing_factor = num_routing_shards / num_primary_shards

在我的情况下,routing_factor 为 1(即 2 个 routing 分片 / 2 个主分片),因此分片 ID 基本上是 _routing 值 mod 2 的哈希值。

使用您的 routing 值,我们得到以下分片 ID(可以在这里进行 murmur3 实验):

  1. murmur3("user1") % 2 = 3305849917 % 2 = 分片 1
  2. murmur3("user2") % 2 = 4180509323 % 2 = 分片 1

然而,如果具有相同 _id 但包含不同 _routing 值的两个文档最终位于同一分片上,为什么只显示一个文档呢?

英文:

According to elasticsearch documentation, it is possible to have docs with the same _id indexed with different _routing values. Hence, the documentation states that the uniqueness on _id is not guaranteed because these docs can end up on different shards (which appears to be a feature rather than a bug)

How about the scenario when two docs with the same _id indexed with different routing values end up on the same shard? Consider the query body below:

  1. PUT test-index
  2. {
  3. "settings": {
  4. "index":
  5. {
  6. "number_of_shards": 2
  7. }
  8. }
  9. }
  10. PUT test-index/_doc/1?routing=user1
  11. {
  12. "title": "This is document number with routing=user1"
  13. }
  14. PUT test-index/_doc/1?routing=user2
  15. {
  16. "title": "This is document number with routing=user2"
  17. }
  18. GET test-index/_search

The search queries renders the following result:

  1. {
  2. "took": 2,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 2,
  6. "successful": 2,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": {
  12. "value": 1,
  13. "relation": "eq"
  14. },
  15. "max_score": 1,
  16. "hits": [
  17. {
  18. "_index": "test-index",
  19. "_id": "1",
  20. "_score": 1,
  21. "_routing": "user2",
  22. "_source": {
  23. "title": "This is document number with routing=user2"
  24. }
  25. }
  26. ]
  27. }
  28. }

Why does the search response only shows doc under user2 despite having 2 shards? It is certain that both docs ended up on the same shard because as per formula:

  1. shard_num = (hash(_routing) % num_routing_shards) / routing_factor
  2. where routing_factor = num_routing_shards / num_primary_shards

In my case routing_factor is 1 (i.e. 2 routing shards / 2 primary shards), so the shard ID is basically the hash of the _routing value mod 2.

Using your routing values, we get the following shard IDs (we can experiment murmur3 here):

  1. murmur3("user1") % 2 = 3305849917 % 2 = shard 1
  2. murmur3("user2") % 2 = 4180509323 % 2 = shard 1

However, if both docs with same _id containing different _routing values end up on the same shard, why does it only show one doc?

答案1

得分: 1

因为它们在相同的分片上具有相同的ID,所以第二个查询不是“插入”,而是更新。

证据:

如果按以下顺序执行以下命令:

  1. PUT 76349386
  2. {
  3. "settings": {
  4. "index":
  5. {
  6. "number_of_shards": 2
  7. }
  8. }
  9. }

然后

  1. PUT 76349386/_doc/1?routing=user1
  2. {
  3. "title": "这是具有routing=user1的文档编号"
  4. }

将给你:

  1. {
  2. "_index": "76349386",
  3. "_id": "1",
  4. "_version": 1,
  5. "result": "created", <= 这里表示操作的结果是创建
  6. "_shards": {
  7. "total": 2,
  8. "successful": 1,
  9. "failed": 0
  10. },
  11. "_seq_no": 0,
  12. "_primary_term": 1
  13. }

但是当你执行第二个命令时

  1. PUT 76349386/_doc/1?routing=user2
  2. {
  3. "title": "这是具有routing=user2的文档编号"
  4. }

响应将略有不同:

  1. {
  2. "_index": "76349386",
  3. "_id": "1",
  4. "_version": 2,
  5. "result": "updated", <= 这是一个更新。
  6. "_shards": {
  7. "total": 2,
  8. "successful": 1,
  9. "failed": 0
  10. },
  11. "_seq_no": 1,
  12. "_primary_term": 1
  13. }

具有ID 1的文档已被更新。

英文:

Tldr

Because they share the same ID on the same shard, the second query is not an insert, it is an update.

Evidences:

If you play the following commands in order:

  1. PUT 76349386
  2. {
  3. &quot;settings&quot;: {
  4. &quot;index&quot;:
  5. {
  6. &quot;number_of_shards&quot;: 2
  7. }
  8. }
  9. }

Then

  1. PUT 76349386/_doc/1?routing=user1
  2. {
  3. &quot;title&quot;: &quot;This is document number with routing=user1&quot;
  4. }

Will give you:

  1. {
  2. &quot;_index&quot;: &quot;76349386&quot;,
  3. &quot;_id&quot;: &quot;1&quot;,
  4. &quot;_version&quot;: 1,
  5. &quot;result&quot;: &quot;created&quot;, &lt;= Here it says the result of the operation was a creation
  6. &quot;_shards&quot;: {
  7. &quot;total&quot;: 2,
  8. &quot;successful&quot;: 1,
  9. &quot;failed&quot;: 0
  10. },
  11. &quot;_seq_no&quot;: 0,
  12. &quot;_primary_term&quot;: 1
  13. }

But then when you will play the second command

  1. PUT 76349386/_doc/1?routing=user2
  2. {
  3. &quot;title&quot;: &quot;This is document number with routing=user2&quot;
  4. }

The response will look a little bit different:

  1. {
  2. &quot;_index&quot;: &quot;76349386&quot;,
  3. &quot;_id&quot;: &quot;1&quot;,
  4. &quot;_version&quot;: 2,
  5. &quot;result&quot;: &quot;updated&quot;, &lt;= it is an update.
  6. &quot;_shards&quot;: {
  7. &quot;total&quot;: 2,
  8. &quot;successful&quot;: 1,
  9. &quot;failed&quot;: 0
  10. },
  11. &quot;_seq_no&quot;: 1,
  12. &quot;_primary_term&quot;: 1
  13. }

The document with _id 1 has been updated.

huangapple
  • 本文由 发表于 2023年5月28日 07:15:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349386.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定