英文:
Number of nested objects in Elasticsearch
问题
以下是您要翻译的部分:
"How to filter by the number of users (e.g. query fetching all documents with more than XX users)."
"Is it possible perhaps using aggregations?"
"Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX)."
谢谢。
英文:
Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
答案1
得分: 1
你无法高效地查询这个
可以使用这个小技巧,但只有在需要进行一次性获取而不是常规用例时才建议这样做,因为它使用了 params._source
,因此在文档数量很多时非常慢。
{
"query": {
"function_score": {
"min_score": 1, # -> 用于过滤的嵌套文档的最小数量
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
它基本上为每个文档计算了一个新的分数,其中分数等于用户数组的长度,然后移除所有分数低于 min_score
的文档。
英文:
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source
and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score
from returning
答案2
得分: 0
以下是翻译好的部分:
最佳方法是在索引时添加一个userCount
字段(因为您知道有多少个元素),然后使用range
查询来查询该字段。非常简单、高效和快速。
嵌套数组的每个元素本身都是一个文档,因此无法通过根级文档进行查询。
如果无法重新创建索引,可以利用_update_by_query
端点来添加该字段:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}
英文:
The best way to do this is to add a userCount
field at indexing time (since you know how many elements there are) and then query that field using a range
query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query
endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论