英文:
Query aggregation buckets for unique sets of terms in Elasticsearch
问题
Here's the translated code portion:
给定以下索引:
PUT /example
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
POST example/_bulk
{ "create" : { "_index" : "example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
And here's the translated response you're looking for:
{
...
"aggregations" : {
"tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : ["a", "b"],
"key_as_string" : "a|b",
"doc_count" : 1
},
{
"key" : ["c", "d"],
"key_as_string" : "c|d",
"doc_count" : 2
},
{
"key" : ["e"],
"key_as_string" : "e",
"doc_count" : 1
}
]
}
}
}
Please note that this translation includes only the requested code and response parts.
英文:
Given the following index:
PUT /example
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
POST example/_bulk
{ "create" : { "_index" : "example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "example" } }
{ "tags" : ["c", "d"] }
I want to aggregate them by the unique set of tags, rather than by all documents that contain the tags. Similar to multi-terms aggregation, but looking at one field. So the response I'm looking for looks like this:
{
...
"aggregations" : {
"tags" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : ["a", "b"],
"key_as_string" : "a|b",
"doc_count" : 1
},
{
"key" : ["c", "d"],
"key_as_string" : "c|d",
"doc_count" : 2
},
{
"key" : ["e"],
"key_as_string" : "e",
"doc_count" : 1
}
]
}
}
}
I know one way to achieve this is to create another field that is a sorted string of all the tags and then do an aggregation on that field, but I just want to know if this is possible. My real use case is a little more complicated, using nested fields, so I'd like to avoid adding a new field.
答案1
得分: 1
尝试这个:
PUT test_example
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
POST _bulk?refresh
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
POST test_example/_search
{
"size": 0,
"aggregations": {
"tags": {
"terms": {
"script": {
"source": "String.join('|', params._source.tags)",
"lang": "painless"
},
"collect_mode": "breadth_first",
"execution_hint": "map"
}
}
}
}
如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:
POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }
POST test_example2/_search
{
"size": 0,
"aggregations": {
"tags": {
"terms": {
"script": {
"source": """
def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
String.join('|', sortedTags)
""",
"lang": "painless"
},
"collect_mode": "breadth_first",
"execution_hint": "map"
}
}
}
}
编辑
如果您希望将 "c|d" 和 "d|c" 放在同一个桶中,您可以在搜索期间使用自定义脚本手动对标签进行排序。以下是更新后的查询:
POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }
POST test_example2/_search
{
"size": 0,
"aggregations": {
"tags": {
"terms": {
"script": {
"source": """
def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
String.join('|', sortedTags)
""",
"lang": "painless"
},
"collect_mode": "breadth_first",
"execution_hint": "map"
}
}
}
}
英文:
try this:
PUT test_example
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
POST _bulk?refresh
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["a", "b"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["e"] }
{ "create" : { "_index" : "test_example" } }
{ "tags" : ["c", "d"] }
POST test_example/_search
{
"size": 0,
"aggregations": {
"tags": {
"terms": {
"script": {
"source": "String.join('|', params._source.tags)",
"lang": "painless"
},
"collect_mode": "breadth_first",
"execution_hint": "map"
}
}
}
}
EDIT
If you want to keep "c|d" and "d|c" in the same bucket you can use a custom and you can sort the tags manually using a custom script during the search. Here's an updated query:
POST _bulk?refresh
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["c", "d"] }
{ "create" : { "_index" : "test_example2" } }
{ "tags" : ["d", "c"] }
POST test_example2/_search
{
"size": 0,
"aggregations": {
"tags": {
"terms": {
"script": {
"source": """
def sortedTags = params._source.tags.stream().sorted().collect(Collectors.toList());
String.join('|', sortedTags)
""",
"lang": "painless"
},
"collect_mode": "breadth_first",
"execution_hint": "map"
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论