英文:
How to create an Elasticsearch filter aggregation which aggregates on two fields?
问题
我有两个关键字弹性字段,它们都是ID。这两个字段可能包含相同的值。
我一直在创建一个过滤聚合来获取其中一个字段的前10个计数(应用了一些过滤条件,与这个问题无关)。
现在我需要创建一个过滤聚合,根据这两个字段获取前10个计数。
因此,对于给定的ID '1234',它将是字段1 = 1234 或 字段2 = 1234 的文档计数。当然,获取单个ID的计数很容易,但是否可以在聚合中使用两个字段呢?
我正在使用Elasticsearch v7.12.1。我注意到 'combined_fields' 在v7.13中引入 - 我不知道它是否相关。
英文:
I have two keyword elastic fields which are IDs. Both fields could contain the same value.
I have been creating a filter aggregation to get the top 10 counts by one of the fields (with some filter applied, which isn't relevant to this question).
I now need to create a filter aggregation to get the top 10 counts based on both of the fields.
So for a given ID '1234', it would be the count of documents where field1 = 1234 OR field2 = 1234. Getting the count for a single ID is easy of course, but is it possible to use two fields in an aggregation like that?
I am using Elasticsearch v7.12.1. I noted that 'combined_fields' was introduced in v7.13 - I don't know if it is relevant at all.
答案1
得分: 0
Tldr;
这只能通过聚合操作无法实现。
你需要将这些项以某种方式分组到单个字段中。
你可以使用运行时字段(这样你就不需要重新索引)。
Solution
GET /76328180/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "field_agg"
}
}
}
}
但是field_agg
是什么?
在这种情况下:
POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234, "field_agg": [1234, 234]}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345, "field_agg": [1234, 345]}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234, "field_agg": [123, 1234]}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567, "field_agg": [234, 4567]}
Initial answer
Tldr;
听起来像是 filters
聚合 的工作。
Solution
使用以下查询:
GET /76328180/_search
{
"size": 0,
"aggs": {
"values": {
"filters": {
"filters": {
"1234": {
"bool": {
"should": [
{
"term": {
"field_1": 1234
}
},
{
"term": {
"field_2": 1234
}
}
]
}
}
}
}
}
}
}
应该会得到:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"values": {
"buckets": {
"1234": {
"doc_count": 3
}
}
}
}
}
To reproduce:
Set up:
POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567}
英文:
Updated answer
Tldr;
This is not possible via only an aggregation.
You will have to somehow group those terms into a single field.
You could use a runtime field (so you do not need to re index)
Solution
GET /76328180/_search
{
"size": 0,
"aggs": {
"NAME": {
"terms": {
"field": "field_agg"
}
}
}
}
But what is field_agg
?
Well in this case:
POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234, "field_agg": [1234, 234]}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345, "field_agg": [1234, 345]}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234, "field_agg": [123, 1234]}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567, "field_agg": [234, 4567]}
Initial answer
Tldr;
Sounds like a job for filters
aggregation.
Solution
With the following query:
GET /76328180/_search
{
"size": 0,
"aggs": {
"values": {
"filters": {
"filters": {
"1234": {
"bool": {
"should": [
{
"term": {
"field_1": 1234
}
},
{
"term": {
"field_2": 1234
}
}
]
}
}
}
}
}
}
}
Should give you:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"values": {
"buckets": {
"1234": {
"doc_count": 3
}
}
}
}
}
To reproduce:
Set up:
POST _bulk
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 234}
{"index":{"_index": "76328180"}}
{"field_1": 1234, "field_2": 345}
{"index":{"_index": "76328180"}}
{"field_1": 123, "field_2": 1234}
{"index":{"_index": "76328180"}}
{"field_1": 234, "field_2": 4567}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论