如何创建一个Elasticsearch过滤聚合,可以在两个字段上进行聚合?

huangapple go评论70阅读模式
英文:

How to create an Elasticsearch filter aggregation which aggregates on two fields?

问题

我有两个关键字弹性字段,它们都是ID。这两个字段可能包含相同的值。

我一直在创建一个过滤聚合来获取其中一个字段的前10个计数(应用了一些过滤条件,与这个问题无关)。

现在我需要创建一个过滤聚合,根据这两个字段获取前10个计数。

因此,对于给定的ID '1234',它将是字段1 = 1234 字段2 = 1234 的文档计数。当然,获取单个ID的计数很容易,但是否可以在聚合中使用两个字段呢?

我正在使用Elasticsearch v7.12.1。我注意到 'combined_fields' 在v7.13中引入 - 我不知道它是否相关。

英文:

I have two keyword elastic fields which are IDs. Both fields could contain the same value.

I have been creating a filter aggregation to get the top 10 counts by one of the fields (with some filter applied, which isn't relevant to this question).

I now need to create a filter aggregation to get the top 10 counts based on both of the fields.

So for a given ID '1234', it would be the count of documents where field1 = 1234 OR field2 = 1234. Getting the count for a single ID is easy of course, but is it possible to use two fields in an aggregation like that?

I am using Elasticsearch v7.12.1. I noted that 'combined_fields' was introduced in v7.13 - I don't know if it is relevant at all.

答案1

得分: 0

Tldr;

这只能通过聚合操作无法实现。

你需要将这些项以某种方式分组到单个字段中。
你可以使用运行时字段(这样你就不需要重新索引)。

Solution

GET /76328180/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "field_agg"
      }
    }
  }
}

但是field_agg是什么?

在这种情况下:

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234, "field_agg": [1234, 234]}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345, "field_agg": [1234, 345]}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234, "field_agg": [123, 1234]}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567, "field_agg": [234, 4567]}

Initial answer

Tldr;

听起来像是 filters 聚合 的工作。

Solution

使用以下查询:

GET /76328180/_search
{
  "size": 0, 
  "aggs": {
    "values": {
      "filters": {
        "filters": {
          "1234": {
            "bool": {
              "should": [
                {
                  "term": {
                    "field_1": 1234
                  }
                },
                {
                  "term": {
                    "field_2": 1234
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

应该会得到:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "values": {
      "buckets": {
        "1234": {
          "doc_count": 3
        }
      }
    }
  }
}

To reproduce:

Set up:

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567}
英文:

Updated answer

Tldr;

This is not possible via only an aggregation.

You will have to somehow group those terms into a single field.
You could use a runtime field (so you do not need to re index)

Solution

GET /76328180/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "field_agg"
      }
    }
  }
}

But what is field_agg ?

Well in this case:

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234, "field_agg": [1234, 234]}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345, "field_agg": [1234, 345]}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234, "field_agg": [123, 1234]}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567, "field_agg": [234, 4567]}

Initial answer

Tldr;

Sounds like a job for filters aggregation.

Solution

With the following query:

GET /76328180/_search
{
  "size": 0, 
  "aggs": {
    "values": {
      "filters": {
        "filters": {
          "1234": {
            "bool": {
              "should": [
                {
                  "term": {
                    "field_1": 1234
                  }
                },
                {
                  "term": {
                    "field_2": 1234
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

Should give you:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "values": {
      "buckets": {
        "1234": {
          "doc_count": 3
        }
      }
    }
  }
}

To reproduce:

Set up:

POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567}

huangapple
  • 本文由 发表于 2023年5月25日 08:30:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328180.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定