2023年5月24日 20:53:18go评论88阅读模式

英文:

Query aggregation buckets for unique sets of terms in Elasticsearch

问题

Here's the translated code portion:

给定以下索引：
PUT /example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}
POST example/_bulk
{ "create" : { "_index" : "example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }

And here's the translated response you're looking for:

{
  ...
  "aggregations" : {
    "tags" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [                       
        {
          "key" : ["a", "b"],
          "key_as_string" : "a|b",
          "doc_count" : 1
        },
        {
          "key" : ["c", "d"],
          "key_as_string" : "c|d",
          "doc_count" : 2
        },
        {
          "key" : ["e"],
          "key_as_string" : "e",
          "doc_count" : 1
        }
      ]
    }
  }
}

Please note that this translation includes only the requested code and response parts.

英文:

Given the following index:

PUT /example
{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;tags&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      }
    }
  }
}
POST example/_bulk
{ &quot;create&quot; : { &quot;_index&quot; : &quot;example&quot; } }
{ &quot;tags&quot; : [&quot;a&quot;, &quot;b&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;example&quot; } }
{ &quot;tags&quot; : [&quot;c&quot;, &quot;d&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;example&quot; } }
{ &quot;tags&quot; : [&quot;e&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;example&quot; } }
{ &quot;tags&quot; : [&quot;c&quot;, &quot;d&quot;] }

I want to aggregate them by the unique set of tags, rather than by all documents that contain the tags. Similar to multi-terms aggregation, but looking at one field. So the response I'm looking for looks like this:

{
  ...
  &quot;aggregations&quot; : {
    &quot;tags&quot; : {
      &quot;doc_count_error_upper_bound&quot; : 0,
      &quot;sum_other_doc_count&quot; : 0,
      &quot;buckets&quot; : [                       
        {
          &quot;key&quot; : [&quot;a&quot;, &quot;b&quot;],
          &quot;key_as_string&quot; : &quot;a|b&quot;,
          &quot;doc_count&quot; : 1
        },
        {
          &quot;key&quot; : [&quot;c&quot;, &quot;d&quot;],
          &quot;key_as_string&quot; : &quot;c|d&quot;,
          &quot;doc_count&quot; : 2
        },
        {
          &quot;key&quot; : [&quot;e&quot;],
          &quot;key_as_string&quot; : &quot;e&quot;,
          &quot;doc_count&quot; : 1
        }
      ]
    }
  }
}

I know one way to achieve this is to create another field that is a sorted string of all the tags and then do an aggregation on that field, but I just want to know if this is possible. My real use case is a little more complicated, using nested fields, so I'd like to avoid adding a new field.

答案1

得分: 1

尝试这个：

PUT test_example
{
  "mappings": {
    "properties": {
      "tags": {
        "type": "keyword"
      }
    }
  }
}
POST _bulk?refresh
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
POST test_example/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": "String.join('|', params._source.tags)",
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中，您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询：

POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }
POST test_example2/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": """
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join('|', sortedTags)
          """,
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

编辑

如果您希望将 "c|d" 和 "d|c" 放在同一个桶中，您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询：

POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }
POST test_example2/_search
{
  "size": 0,
  "aggregations": {
    "tags": {
      "terms": {
        "script": {
          "source": """
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join('|', sortedTags)
          """,
          "lang": "painless"
        },
        "collect_mode": "breadth_first",
        "execution_hint": "map"
      }
    }
  }
}

英文:

try this:

PUT test_example
{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;tags&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      }
    }
  }
}
POST _bulk?refresh
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example&quot; } }
{ &quot;tags&quot; : [&quot;a&quot;, &quot;b&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example&quot; } }
{ &quot;tags&quot; : [&quot;c&quot;, &quot;d&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example&quot; } }
{ &quot;tags&quot; : [&quot;e&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example&quot; } }
{ &quot;tags&quot; : [&quot;c&quot;, &quot;d&quot;] }
POST test_example/_search
{
  &quot;size&quot;: 0,
  &quot;aggregations&quot;: {
    &quot;tags&quot;: {
      &quot;terms&quot;: {
        &quot;script&quot;: {
          &quot;source&quot;: &quot;String.join(&#39;|&#39;, params._source.tags)&quot;,
          &quot;lang&quot;: &quot;painless&quot;
        },
        &quot;collect_mode&quot;: &quot;breadth_first&quot;,
        &quot;execution_hint&quot;: &quot;map&quot;
      }
    }
  }
}

EDIT

If you want to keep "c|d" and "d|c" in the same bucket you can use a custom and you can sort the tags manually using a custom script during the search. Here's an updated query:

POST _bulk?refresh
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example2&quot; } }
{ &quot;tags&quot; : [&quot;c&quot;, &quot;d&quot;] }
{ &quot;create&quot; : { &quot;_index&quot; : &quot;test_example2&quot; } }
{ &quot;tags&quot; : [&quot;d&quot;, &quot;c&quot;] }
POST test_example2/_search
{
  &quot;size&quot;: 0,
  &quot;aggregations&quot;: {
    &quot;tags&quot;: {
      &quot;terms&quot;: {
        &quot;script&quot;: {
          &quot;source&quot;: &quot;&quot;&quot;
            def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
            String.join(&#39;|&#39;, sortedTags)
          &quot;&quot;&quot;,
          &quot;lang&quot;: &quot;painless&quot;
        },
        &quot;collect_mode&quot;: &quot;breadth_first&quot;,
        &quot;execution_hint&quot;: &quot;map&quot;
      }
    }
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Elasticsearch中查询聚合桶以获取唯一的术语集。

问题

答案1

编辑

EDIT

DynamoDB 转至 Elasticsearch 使用 AWS SDK v2？

无法获取ElasticSearch V8.6.2中新创建模板的别名。

Elastic Search正则表达式未按预期工作。

ElasticSearch: 如何检查多个分词器将文本拆分成标记的方式？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。