在Elasticsearch中查询聚合桶以获取唯一的术语集。

huangapple go评论81阅读模式
英文:

Query aggregation buckets for unique sets of terms in Elasticsearch

问题

Here's the translated code portion:

  1. 给定以下索引:
  2. PUT /example
  3. {
  4. "mappings": {
  5. "properties": {
  6. "tags": {
  7. "type": "keyword"
  8. }
  9. }
  10. }
  11. }
  12. POST example/_bulk
  13. { "create" : { "_index" : "example" } }
  14. { "tags" : ["a", "b"] }
  15. { "create" : { "_index" : "example" } }
  16. { "tags" : ["c", "d"] }
  17. { "create" : { "_index" : "example" } }
  18. { "tags" : ["e"] }
  19. { "create" : { "_index" : "example" } }
  20. { "tags" : ["c", "d"] }

And here's the translated response you're looking for:

  1. {
  2. ...
  3. "aggregations" : {
  4. "tags" : {
  5. "doc_count_error_upper_bound" : 0,
  6. "sum_other_doc_count" : 0,
  7. "buckets" : [
  8. {
  9. "key" : ["a", "b"],
  10. "key_as_string" : "a|b",
  11. "doc_count" : 1
  12. },
  13. {
  14. "key" : ["c", "d"],
  15. "key_as_string" : "c|d",
  16. "doc_count" : 2
  17. },
  18. {
  19. "key" : ["e"],
  20. "key_as_string" : "e",
  21. "doc_count" : 1
  22. }
  23. ]
  24. }
  25. }
  26. }

Please note that this translation includes only the requested code and response parts.

英文:

Given the following index:

  1. PUT /example
  2. {
  3. "mappings": {
  4. "properties": {
  5. "tags": {
  6. "type": "keyword"
  7. }
  8. }
  9. }
  10. }
  11. POST example/_bulk
  12. { "create" : { "_index" : "example" } }
  13. { "tags" : ["a", "b"] }
  14. { "create" : { "_index" : "example" } }
  15. { "tags" : ["c", "d"] }
  16. { "create" : { "_index" : "example" } }
  17. { "tags" : ["e"] }
  18. { "create" : { "_index" : "example" } }
  19. { "tags" : ["c", "d"] }

I want to aggregate them by the unique set of tags, rather than by all documents that contain the tags. Similar to multi-terms aggregation, but looking at one field. So the response I'm looking for looks like this:

  1. {
  2. ...
  3. "aggregations" : {
  4. "tags" : {
  5. "doc_count_error_upper_bound" : 0,
  6. "sum_other_doc_count" : 0,
  7. "buckets" : [
  8. {
  9. "key" : ["a", "b"],
  10. "key_as_string" : "a|b",
  11. "doc_count" : 1
  12. },
  13. {
  14. "key" : ["c", "d"],
  15. "key_as_string" : "c|d",
  16. "doc_count" : 2
  17. },
  18. {
  19. "key" : ["e"],
  20. "key_as_string" : "e",
  21. "doc_count" : 1
  22. }
  23. ]
  24. }
  25. }
  26. }

I know one way to achieve this is to create another field that is a sorted string of all the tags and then do an aggregation on that field, but I just want to know if this is possible. My real use case is a little more complicated, using nested fields, so I'd like to avoid adding a new field.

答案1

得分: 1

尝试这个:

  1. PUT test_example
  2. {
  3. "mappings": {
  4. "properties": {
  5. "tags": {
  6. "type": "keyword"
  7. }
  8. }
  9. }
  10. }
  11. POST _bulk?refresh
  12. { "create" : { "_index" : "test_example" } }
  13. { "tags" : ["a", "b"] }
  14. { "create" : { "_index" : "test_example" } }
  15. { "tags" : ["c", "d"] }
  16. { "create" : { "_index" : "test_example" } }
  17. { "tags" : ["e"] }
  18. { "create" : { "_index" : "test_example" } }
  19. { "tags" : ["c", "d"] }
  20. POST test_example/_search
  21. {
  22. "size": 0,
  23. "aggregations": {
  24. "tags": {
  25. "terms": {
  26. "script": {
  27. "source": "String.join('|', params._source.tags)",
  28. "lang": "painless"
  29. },
  30. "collect_mode": "breadth_first",
  31. "execution_hint": "map"
  32. }
  33. }
  34. }
  35. }

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:

  1. POST _bulk?refresh
  2. { "create" : { "_index" : "test_example2" } }
  3. { "tags" : ["c", "d"] }
  4. { "create" : { "_index" : "test_example2" } }
  5. { "tags" : ["d", "c"] }
  6. POST test_example2/_search
  7. {
  8. "size": 0,
  9. "aggregations": {
  10. "tags": {
  11. "terms": {
  12. "script": {
  13. "source": """
  14. def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
  15. String.join('|', sortedTags)
  16. """,
  17. "lang": "painless"
  18. },
  19. "collect_mode": "breadth_first",
  20. "execution_hint": "map"
  21. }
  22. }
  23. }
  24. }

在Elasticsearch中查询聚合桶以获取唯一的术语集。

编辑

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:

  1. POST _bulk?refresh
  2. { "create" : { "_index" : "test_example2" } }
  3. { "tags" : ["c", "d"] }
  4. { "create" : { "_index" : "test_example2" } }
  5. { "tags" : ["d", "c"] }
  6. POST test_example2/_search
  7. {
  8. "size": 0,
  9. "aggregations": {
  10. "tags": {
  11. "terms": {
  12. "script": {
  13. "source": """
  14. def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
  15. String.join('|', sortedTags)
  16. """,
  17. "lang": "painless"
  18. },
  19. "collect_mode": "breadth_first",
  20. "execution_hint": "map"
  21. }
  22. }
  23. }
  24. }

在Elasticsearch中查询聚合桶以获取唯一的术语集。

英文:

try this:

  1. PUT test_example
  2. {
  3. "mappings": {
  4. "properties": {
  5. "tags": {
  6. "type": "keyword"
  7. }
  8. }
  9. }
  10. }
  11. POST _bulk?refresh
  12. { "create" : { "_index" : "test_example" } }
  13. { "tags" : ["a", "b"] }
  14. { "create" : { "_index" : "test_example" } }
  15. { "tags" : ["c", "d"] }
  16. { "create" : { "_index" : "test_example" } }
  17. { "tags" : ["e"] }
  18. { "create" : { "_index" : "test_example" } }
  19. { "tags" : ["c", "d"] }
  20. POST test_example/_search
  21. {
  22. "size": 0,
  23. "aggregations": {
  24. "tags": {
  25. "terms": {
  26. "script": {
  27. "source": "String.join('|', params._source.tags)",
  28. "lang": "painless"
  29. },
  30. "collect_mode": "breadth_first",
  31. "execution_hint": "map"
  32. }
  33. }
  34. }
  35. }

在Elasticsearch中查询聚合桶以获取唯一的术语集。

EDIT

If you want to keep "c|d" and "d|c" in the same bucket you can use a custom and you can sort the tags manually using a custom script during the search. Here's an updated query:

  1. POST _bulk?refresh
  2. { "create" : { "_index" : "test_example2" } }
  3. { "tags" : ["c", "d"] }
  4. { "create" : { "_index" : "test_example2" } }
  5. { "tags" : ["d", "c"] }
  6. POST test_example2/_search
  7. {
  8. "size": 0,
  9. "aggregations": {
  10. "tags": {
  11. "terms": {
  12. "script": {
  13. "source": """
  14. def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
  15. String.join('|', sortedTags)
  16. """,
  17. "lang": "painless"
  18. },
  19. "collect_mode": "breadth_first",
  20. "execution_hint": "map"
  21. }
  22. }
  23. }
  24. }

在Elasticsearch中查询聚合桶以获取唯一的术语集。

huangapple
  • 本文由 发表于 2023年5月24日 20:53:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76323766.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定