Elasticsearch中嵌套对象的数量

huangapple go评论61阅读模式
英文:

Number of nested objects in Elasticsearch

问题

以下是您要翻译的部分:

"How to filter by the number of users (e.g. query fetching all documents with more than XX users)."

"Is it possible perhaps using aggregations?"

"Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX)."

谢谢。

英文:

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "some_id": {"type": "long"},
      "user": {
        "type": "nested",
        "properties": {
          "first": {
            "type": "keyword"
          },
          "last": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "some_id": 111,
  "user" : [
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

How to filter by the number of users (e.g. query fetching all documents with more than XX users).

I was thinking to using a runtime_field but this gives an error:

GET my-index-000001/_search
{
  "runtime_mappings": {
    "num": {
      "type": "long",
      "script": {
        "source": "emit(doc['some_id'].value)"
      }
    },
    "num1": {
      "type": "long",
      "script": {
        "source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
      }
    }
  }
  ,"fields": [
    "num","num1"
  ]
}

Is it possible perhaps using aggregations?

Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).

Thanks.

答案1

得分: 1

你无法高效地查询这个

可以使用这个小技巧,但只有在需要进行一次性获取而不是常规用例时才建议这样做,因为它使用了 params._source,因此在文档数量很多时非常慢。

{
  "query": {
    "function_score": {
      "min_score": 1,  # -> 用于过滤的嵌套文档的最小数量
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": "params._source['user'].size()"
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

它基本上为每个文档计算了一个新的分数,其中分数等于用户数组的长度,然后移除所有分数低于 min_score 的文档。

英文:

You cannot query this efficiently

It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs

{
  "query": {
    "function_score": {
      "min_score": 1,  # -> min number of nested docs to filter by
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": "params._source['user'].size()"
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning

答案2

得分: 0

以下是翻译好的部分:

最佳方法是在索引时添加一个userCount字段(因为您知道有多少个元素),然后使用range查询来查询该字段。非常简单、高效和快速。

嵌套数组的每个元素本身都是一个文档,因此无法通过根级文档进行查询。

如果无法重新创建索引,可以利用_update_by_query端点来添加该字段:

POST my-index-000001/_update_by_query?wait_for_completion=false
{
  "script": {
    "source": """
     ctx._source.userCount = ctx._source.user.size()
    """
  }
}
英文:

The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.

Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.

If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:

POST my-index-000001/_update_by_query?wait_for_completion=false
{
  "script": {
    "source": """
     ctx._source.userCount = ctx._source.user.size()
    """
  }
}

huangapple
  • 本文由 发表于 2023年2月6日 16:24:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75358877.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定