2023年5月25日 08:30:25go评论81阅读模式

英文:

How to create an Elasticsearch filter aggregation which aggregates on two fields?

问题

我有两个关键字弹性字段，它们都是ID。这两个字段可能包含相同的值。

我一直在创建一个过滤聚合来获取其中一个字段的前10个计数（应用了一些过滤条件，与这个问题无关）。

现在我需要创建一个过滤聚合，根据这两个字段获取前10个计数。

因此，对于给定的ID '1234'，它将是字段1 = 1234 或字段2 = 1234 的文档计数。当然，获取单个ID的计数很容易，但是否可以在聚合中使用两个字段呢？

我正在使用Elasticsearch v7.12.1。我注意到 'combined_fields' 在v7.13中引入 - 我不知道它是否相关。

英文:

I have two keyword elastic fields which are IDs. Both fields could contain the same value.

I have been creating a filter aggregation to get the top 10 counts by one of the fields (with some filter applied, which isn't relevant to this question).

I now need to create a filter aggregation to get the top 10 counts based on both of the fields.

So for a given ID '1234', it would be the count of documents where field1 = 1234 OR field2 = 1234. Getting the count for a single ID is easy of course, but is it possible to use two fields in an aggregation like that?

I am using Elasticsearch v7.12.1. I noted that 'combined_fields' was introduced in v7.13 - I don't know if it is relevant at all.

答案1

得分: 0

Tldr;

这只能通过聚合操作无法实现。

你需要将这些项以某种方式分组到单个字段中。
你可以使用运行时字段（这样你就不需要重新索引）。

Solution

GET /76328180/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "field_agg"
      }
    }
  }
}

但是field_agg是什么？

在这种情况下：

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234, "field_agg": [1234, 234]}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345, "field_agg": [1234, 345]}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234, "field_agg": [123, 1234]}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567, "field_agg": [234, 4567]}

Initial answer

Tldr;

听起来像是 filters 聚合的工作。

Solution

使用以下查询：

GET /76328180/_search
{
  "size": 0, 
  "aggs": {
    "values": {
      "filters": {
        "filters": {
          "1234": {
            "bool": {
              "should": [
                {
                  "term": {
                    "field_1": 1234
                  }
                },
                {
                  "term": {
                    "field_2": 1234
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

应该会得到：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "values": {
      "buckets": {
        "1234": {
          "doc_count": 3
        }
      }
    }
  }
}

To reproduce:

Set up:

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567}

英文:

Updated answer

Tldr;

This is not possible via only an aggregation.

You will have to somehow group those terms into a single field.
You could use a runtime field (so you do not need to re index)

Solution

GET /76328180/_search
{
  &quot;size&quot;: 0,
  &quot;aggs&quot;: {
    &quot;NAME&quot;: {
      &quot;terms&quot;: {
        &quot;field&quot;: &quot;field_agg&quot;
      }
    }
  }
}

But what is field_agg ?

Well in this case:

POST _bulk
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 1234, &quot;field_2&quot;: 234, &quot;field_agg&quot;: [1234, 234]}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 1234, &quot;field_2&quot;: 345, &quot;field_agg&quot;: [1234, 345]}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 123, &quot;field_2&quot;: 1234, &quot;field_agg&quot;: [123, 1234]}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 234, &quot;field_2&quot;: 4567, &quot;field_agg&quot;: [234, 4567]}

Initial answer

Tldr;

Sounds like a job for filters aggregation.

Solution

With the following query:

GET /76328180/_search
{
  &quot;size&quot;: 0, 
  &quot;aggs&quot;: {
    &quot;values&quot;: {
      &quot;filters&quot;: {
        &quot;filters&quot;: {
          &quot;1234&quot;: {
            &quot;bool&quot;: {
              &quot;should&quot;: [
                {
                  &quot;term&quot;: {
                    &quot;field_1&quot;: 1234
                  }
                },
                {
                  &quot;term&quot;: {
                    &quot;field_2&quot;: 1234
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

Should give you:

{
  &quot;took&quot;: 1,
  &quot;timed_out&quot;: false,
  &quot;_shards&quot;: {
    &quot;total&quot;: 1,
    &quot;successful&quot;: 1,
    &quot;skipped&quot;: 0,
    &quot;failed&quot;: 0
  },
  &quot;hits&quot;: {
    &quot;total&quot;: {
      &quot;value&quot;: 4,
      &quot;relation&quot;: &quot;eq&quot;
    },
    &quot;max_score&quot;: null,
    &quot;hits&quot;: []
  },
  &quot;aggregations&quot;: {
    &quot;values&quot;: {
      &quot;buckets&quot;: {
        &quot;1234&quot;: {
          &quot;doc_count&quot;: 3
        }
      }
    }
  }
}

To reproduce:

Set up:

POST _bulk
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 1234, &quot;field_2&quot;: 234}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 1234, &quot;field_2&quot;: 345}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 123, &quot;field_2&quot;: 1234}
{&quot;index&quot;:{&quot;_index&quot;: &quot;76328180&quot;}}
{&quot;field_1&quot;: 234, &quot;field_2&quot;: 4567}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何创建一个Elasticsearch过滤聚合，可以在两个字段上进行聚合？

问题

答案1

Tldr;

Solution

Tldr;

Solution

To reproduce:

Set up:

Tldr;

Solution

Tldr;

Solution

To reproduce:

Set up:

我有一个问题，如何将查询转换为 DSL。

Entity Relationship Query and Full Text Search

在Elasticsearch中仅获取聚合结果

如何在 Elasticsearch 的 9200 端口禁用公共访问

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论