在Elasticsearch中查询聚合桶以获取唯一的术语集。

huangapple go评论62阅读模式
英文:

Query aggregation buckets for unique sets of terms in Elasticsearch

问题

Here's the translated code portion:

给定以下索引:
PUT /example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}

POST example/_bulk
{ "create" : { "_index" : "example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }

And here's the translated response you're looking for:

{
  ...
  "aggregations" : {
    "tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [                       
        {
          "key" : ["a", "b"],
          "key_as_string" : "a|b",
          "doc_count" : 1
        },
        {
          "key" : ["c", "d"],
          "key_as_string" : "c|d",
          "doc_count" : 2
        },
        {
          "key" : ["e"],
          "key_as_string" : "e",
          "doc_count" : 1
        }
      ]
    }
  }
}

Please note that this translation includes only the requested code and response parts.

英文:

Given the following index:

PUT /example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}

POST example/_bulk
{ "create" : { "_index" : "example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }

I want to aggregate them by the unique set of tags, rather than by all documents that contain the tags. Similar to multi-terms aggregation, but looking at one field. So the response I'm looking for looks like this:

{
  ...
  "aggregations" : {
    "tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [                       
        {
          "key" : ["a", "b"],
          "key_as_string" : "a|b",
          "doc_count" : 1
        },
        {
          "key" : ["c", "d"],
          "key_as_string" : "c|d",
          "doc_count" : 2
        },
        {
          "key" : ["e"],
          "key_as_string" : "e",
          "doc_count" : 1
        }
      ]
    }
  }
}

I know one way to achieve this is to create another field that is a sorted string of all the tags and then do an aggregation on that field, but I just want to know if this is possible. My real use case is a little more complicated, using nested fields, so I'd like to avoid adding a new field.

答案1

得分: 1

尝试这个:

PUT test_example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}

POST _bulk?refresh
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }

POST test_example/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": "String.join('|', params._source.tags)",
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:

POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }

POST test_example2/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": """
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join('|', sortedTags)
          """,
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

在Elasticsearch中查询聚合桶以获取唯一的术语集。

编辑

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:

POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }

POST test_example2/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": """
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join('|', sortedTags)
          """,
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

在Elasticsearch中查询聚合桶以获取唯一的术语集。

英文:

try this:

PUT test_example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}

POST _bulk?refresh
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }

POST test_example/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": "String.join('|', params._source.tags)",
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

在Elasticsearch中查询聚合桶以获取唯一的术语集。

EDIT

If you want to keep "c|d" and "d|c" in the same bucket you can use a custom and you can sort the tags manually using a custom script during the search. Here's an updated query:

POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }

POST test_example2/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": """
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join('|', sortedTags)
          """,
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

在Elasticsearch中查询聚合桶以获取唯一的术语集。

huangapple
  • 本文由 发表于 2023年5月24日 20:53:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76323766.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定