2023年2月6日 16:24:30go评论90阅读模式

英文:

Number of nested objects in Elasticsearch

问题

以下是您要翻译的部分：

"How to filter by the number of users (e.g. query fetching all documents with more than XX users)."

"Is it possible perhaps using aggregations?"

"Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX)."

谢谢。

英文:

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:

PUT my-index-000001
{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;some_id&quot;: {&quot;type&quot;: &quot;long&quot;},
      &quot;user&quot;: {
        &quot;type&quot;: &quot;nested&quot;,
        &quot;properties&quot;: {
          &quot;first&quot;: {
            &quot;type&quot;: &quot;keyword&quot;
          },
          &quot;last&quot;: {
            &quot;type&quot;: &quot;keyword&quot;
          }
        }
      }
    }
  }
}
PUT my-index-000001/_doc/1
{
  &quot;some_id&quot;: 111,
  &quot;user&quot; : [
    {
      &quot;first&quot; : &quot;John&quot;,
      &quot;last&quot; :  &quot;Smith&quot;
    },
    {
      &quot;first&quot; : &quot;Alice&quot;,
      &quot;last&quot; :  &quot;White&quot;
    }
  ]
}

How to filter by the number of users (e.g. query fetching all documents with more than XX users).

I was thinking to using a runtime_field but this gives an error:

GET my-index-000001/_search
{
  &quot;runtime_mappings&quot;: {
    &quot;num&quot;: {
      &quot;type&quot;: &quot;long&quot;,
      &quot;script&quot;: {
        &quot;source&quot;: &quot;emit(doc[&#39;some_id&#39;].value)&quot;
      }
    },
    &quot;num1&quot;: {
      &quot;type&quot;: &quot;long&quot;,
      &quot;script&quot;: {
        &quot;source&quot;: &quot;emit(doc[&#39;user&#39;].size())&quot; // &lt;- this breaks with &quot;No field found for [user] in mapping&quot;
      }
    }
  }
  ,&quot;fields&quot;: [
    &quot;num&quot;,&quot;num1&quot;
  ]
}

Is it possible perhaps using aggregations?

Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).

Thanks.

答案1

得分: 1

你无法高效地查询这个

可以使用这个小技巧，但只有在需要进行一次性获取而不是常规用例时才建议这样做，因为它使用了 params._source，因此在文档数量很多时非常慢。

{
  "query": {
    "function_score": {
      "min_score": 1,  # -> 用于过滤的嵌套文档的最小数量
      "query": {
        "match_all": {}
      },
      "functions": [
        {
          "script_score": {
            "script": "params._source['user'].size()"
          }
        }
      ],
      "boost_mode": "replace"
    }
  }
}

它基本上为每个文档计算了一个新的分数，其中分数等于用户数组的长度，然后移除所有分数低于 min_score 的文档。

英文:

You cannot query this efficiently

It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs

{
  &quot;query&quot;: {
    &quot;function_score&quot;: {
      &quot;min_score&quot;: 1,  # -&gt; min number of nested docs to filter by
      &quot;query&quot;: {
        &quot;match_all&quot;: {}
      },
      &quot;functions&quot;: [
        {
          &quot;script_score&quot;: {
            &quot;script&quot;: &quot;params._source[&#39;user&#39;].size()&quot;
          }
        }
      ],
      &quot;boost_mode&quot;: &quot;replace&quot;
    }
  }
}

It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning

答案2

得分: 0

以下是翻译好的部分：

最佳方法是在索引时添加一个userCount字段（因为您知道有多少个元素），然后使用range查询来查询该字段。非常简单、高效和快速。

嵌套数组的每个元素本身都是一个文档，因此无法通过根级文档进行查询。

如果无法重新创建索引，可以利用_update_by_query端点来添加该字段：

POST my-index-000001/_update_by_query?wait_for_completion=false
{
  "script": {
    "source": """
     ctx._source.userCount = ctx._source.user.size()
    """
  }
}

英文:

The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.

Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.

If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:

POST my-index-000001/_update_by_query?wait_for_completion=false
{
  &quot;script&quot;: {
    &quot;source&quot;: &quot;&quot;&quot;
     ctx._source.userCount = ctx._source.user.size()
    &quot;&quot;&quot;
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elasticsearch中嵌套对象的数量

问题

答案1

答案2

ElasticSearch + Go：索引失败（没有名称的功能）

Kibana Lens – 无法自定义时间戳为几个月、周或年

How can I create a index in Elasticsearch with `go-elasticsearch` library?

elasticsearch 边缘 n-gram 分词器：在标记中包含符号

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。