2020年1月3日 14:28:14go评论160阅读模式

英文:

Elasticsearch query on a nested field with condition

问题

Elasticsearch v7.0

你好，祝你有美好的一天！

我正在尝试创建一个查询，其中有一个条件：如果一个嵌套字段只有一个元素，那么获取第一个元素；如果一个嵌套字段有两个或更多元素，那么获取匹配的嵌套字段条件。

情境：

我有一个名为 "socialmedia" 的索引，它有一个名为 "cms" 的嵌套字段，用于为该文档设置情感。

"cms" 字段的示例文档如下：

"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]

这个 "cms" 字段默认包含 "cli_id" : 0 作为其第一个元素（这意味着所有客户/用户都可以看到）。但是，不久之后，它会变成这样：

"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    }
]

第二和第三个元素显示，具有 "cli_id" 等于 1 和 2 的客户对该文档进行了情感标记。

现在，我想要创建一个查询，如果登录的客户对特定文档还没有情感标记，那么它将获取具有 "cli_id" : 0 的 "cmx_sentiment"。

但是，如果已登录的客户对根据他的筛选条件提取的文档有情感标记，查询将获取与已登录客户的 cli_id 匹配的 "cmx_sentiment"。

例如：
"具有 cli_id 为 2 的客户将获取上面给定文档的 'Neutral' cmx_sentiment"
"具有 cli_id 为 5 的客户将获取 'Negative' cmx_sentiment，因为他尚未对该文档进行情感标记"

伪代码：

如果文档由客户指定了情感标记，请获取 "cli_id" 等于客户的 ID 的 "cmx_sentiment"
如果文档是新的或客户尚未在该文档上标记情感，请获取具有 "cli_id" == 0 的元素的 "cmx_sentiment"

我需要一个符合上述伪代码条件的查询。

这是我的示例查询：

"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // 我计划在这里创建一个布尔方法，以测试 cli_id 是否等于已登录客户的 ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]

我的查询是否正确？

我提供的查询存在问题，它会将所有元素相加，而不是仅获取一个元素。

上述查询提供了以下情景：

客户的 cli_id 为 2 登录

它检索了 "Neutral" 和 "Negative" 的 cmx_sentiment，而不仅仅是 "Neutral"。

英文:

Elasticsearch v7.0

Hello and good day!

I'm trying to create a query that will have a condition: if a nested field has only 1 element, get that first element, if a nested field has 2 more or elements, get a matching nested field condition

Scenario:

I have an index named socialmedia and has a nested field named cms which places a sentiment for that document

An example document of the cms field looks like this

&quot;_id&quot; : 1,
&quot;cms&quot; : [
    {
      &quot;cli_id&quot; : 0,
      &quot;cmx_sentiment&quot; : &quot;Negative&quot;
    }
]

This cms field contains "cli_id" : 0 by default for its 1st element (this means it is for all the clients/users to see) but sooner or later, it goes like this:

&quot;_id&quot;: 1,
&quot;cms&quot; : [
    {
      &quot;cli_id&quot; : 0,
      &quot;cmx_sentiment&quot; : &quot;Negative&quot;
    },
    {
      &quot;cli_id&quot; : 1,
      &quot;cmx_sentiment&quot; : &quot;Positive&quot;
    },
    {
      &quot;cli_id&quot; : 2,
      &quot;cmx_sentiment&quot; : &quot;Neutral&quot;
    },
]

The 2nd and 3rd element shows that the clients with cli_id equals to 1 and 2 has made a sentiment for that document.

Now, I want to formulate a query that if the client who logged in has no sentiment yet for a specific document, it fetches the cmx_sentiment that has the "cli_id" : 0

BUT , if the client who has logged in has a sentiment for the fetched documents according to his filters, the query will fetch the cmx_sentiment that has the matching cli_id of the logged in client

for example:
the client who has a cli_id of 2, will get the cmx_sentiment of **Neutral** according to the given document above

the client who has a cli_id of 5, will get the cmx_sentiment of **Negative** because he hasn't given a sentiment to the document

PSEUDO CODE :

If a document has a sentiment indicated by the client, get the cmx_sentiment of the cli_id == to the client's ID

if a document is fresh or the client HAS NOT labeled yet a sentiment on that document, get the element's cmx_sentiment that has cli_id == 0

I'm in need of a query to condition for the pseudo code above

Here's my sample query:

&quot;aggs&quot; =&gt; [
    &quot;CMS&quot; =&gt; [
        &quot;nested&quot; =&gt; [
            &quot;path&quot; =&gt; &quot;cms&quot;,
        ],
        &quot;aggs&quot; =&gt; [
            &quot;FILTER&quot; =&gt; [
                &quot;filter&quot; =&gt; [
                    &quot;bool&quot; =&gt; [
                        &quot;should&quot; =&gt; [
                            [
                                &quot;match&quot; =&gt; [
                                    &quot;cms.cli_id&quot; =&gt; 0
                                ]
                            ],
                            [
                                &quot;bool&quot; =&gt; [
                                    &quot;must&quot; =&gt; [
                                        [
                                            // I&#39;m planing to create a bool method here to test if cli_id is equalis to the logged-in client&#39;s ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                &quot;aggs&quot;=&gt; [
                    &quot;TONALITY&quot;=&gt; [
                        &quot;terms&quot;=&gt; [
                            &quot;field&quot; =&gt; &quot;cms.cmx_sentiment&quot;
                        ],
                    ]
                ]
            ]
        ]
    ]
]

Is my query correct?

The problem with the query I have provided, is that it SUMS all the elements, instead of picking one only

The query above provides this scenario:

The client with cli_id 2 logs in

Both the Neutral and Negative cmx_sentiment are being retrieved, instead of the Neutral alone

答案1

得分: 1

以下是您要翻译的内容：

"After the discussion with OP I'm rewriting this answer."

To get the desired result you will have to consider the following to build the query and aggregation:

Query:

This will contain any filter applied by the logged-in user. For the example purpose, I'm using match_all since every document has at least one nested doc against the cms field, i.e., for cli_id: 0

Aggregation:

Here we have to divide the aggregations into two:

default_only
sentiment_only

default_only

In this aggregation, we find the count for those documents which don't have nested documents for cli_id: <logged-in client id>. i.e., only those docs which have nested docs for cli_id: 0.
To do this, we follow the steps below:

default_only: Use filter aggregation to get documents that do not have nested documents for cli_id: <logged-in client id>, i.e., using must_not => cli_id: <logged-in client id>
default_nested: Add sub-aggregation for nested docs since we need to get the docs against sentiment which is a field of the nested document.
sentiment_for_cli_id: Add sub-aggregation to default_nested aggregation in order to get sentiment only for the default client, i.e., for cli_id: 0.
default: Add this terms sub-aggregation to sentiment_for_cli_id aggregation to get counts against the sentiment. Note that this count is of nested docs, and since you always have only one nested doc per cli_id, therefore this count seems to be the count of docs but it is not.
the_doc_count: Add this reverse_nested aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub-aggregation of default aggregation.

sentiment_only

This aggregation gives a count against each sentiment where cli_id: <logged-in client id> is present. For this, we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:

sentiment_only: must => cli_id: <logged-in client id>
sentiment_nested: same reason as above
sentiment_for_cli_id: same but instead of default, we filter for cli_id: <logged-in client id>
sentiment: same as default
the_doc_count: same as above

Example:

PUT socialmedia/_bulk
{"index":{"_id":1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id":2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id":3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id":4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}

Query:

GET socialmedia/_search
{
"query": {
"match_all": {}
},
"aggs": {
"default_only": {
"filter": {
"bool": {
"must_not": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"default_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 0
}
},
"aggs": {
"default": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
},
"sentiment_only": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"sentiment_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 2
}
},
"aggs": {
"sentiment": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}

Agg Output:

"aggregations" : {
"default_only" : {
"doc_count" : 1,
"default_nested" : {
"doc_count" : 1,
"sentiment_for_cli_id" : {
"doc_count" : 1,
"default" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Positive",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
},
"sentiment_only" : {
"doc_count" : 3,
"sentiment_nested" : {
"doc_count" : 6,
"sentiment_for_cli_id" : {
"doc_count" : 3,
"sentiment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Neutral",
"doc_count" : 2,
"the_doc_count" : {
"doc_count" : 2
}
},
{
"key" : "Negative",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
}
}

英文:

After the discussion with OP I'm rewriting this answer.

To get the desired result you will have to consider the following to build the query and aggregation:

###Query:
This will contain any filter applied by logged in user. For the example purpose I'm using match_all since every document has atleast one nested doc against cms field i.e. for cli_id: 0

###Aggregation:
Here we have to divide the aggregations into two:

default_only
sentiment_only

####default_only
In this aggregation we find count for those document which don't have nested document for cli_id: <logged in client id>. i.e. only those docs which have nested doc for cli_id: 0.
To do this we follow the steps below:

default_only Use filter aggregation to get document which does not have nested document for cli_id: <logged in client id> i.e. using must_not => cli_id: <logged in client id>
default_nested : Add sub aggregation for nested docs since we need to get the docs against sentiment which is field of nested document.
sentiment_for_cli_id : Add sub aggregation to default_nested aggregation in order to get sentiment only for default client i.e. for cli_id: 0.
default : Add this terms sub aggregation to sentiment_for_cli_id aggregation to get counts against the sentiment. Note that this count is of nested docs and since you always have only one nested doc per cli_id therefore this count seems to be the count of docs but it is not.
the_doc_count: Add this reverse_nested aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub aggregation of default aggregation.

####sentiment_only
This aggregation give count against each sentiment where cli_id: <logged in client id> is present. For this we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:

sentiment_only : must => cli_id: <logged in client id>
sentiment_nested : same reason as above
sentiment_for_cli_id: same but instead of default we filter for cli_id: <logged in client id>
sentiment: same as default
the_doc_count: same as above

###Example:

PUT socialmedia/_bulk
{&quot;index&quot;:{&quot;_id&quot;: 1}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 2}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Neutral&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 3}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Negative&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 4}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Neutral&quot;}]}

####Query:

GET socialmedia/_search
{
  &quot;query&quot;: {
    &quot;match_all&quot;: {}
  },
  &quot;aggs&quot;: {
    &quot;default_only&quot;: {
      &quot;filter&quot;: {
        &quot;bool&quot;: {
          &quot;must_not&quot;: [
            {
              &quot;nested&quot;: {
                &quot;path&quot;: &quot;cms&quot;,
                &quot;query&quot;: {
                  &quot;term&quot;: {
                    &quot;cms.cli_id&quot;: 2
                  }
                }
              }
            }
          ]
        }
      },
      &quot;aggs&quot;: {
        &quot;default_nested&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;cms&quot;
          },
          &quot;aggs&quot;: {
            &quot;sentiment_for_cli_id&quot;: {
              &quot;filter&quot;: {
                &quot;term&quot;: {
                  &quot;cms.cli_id&quot;: 0
                }
              },
              &quot;aggs&quot;: {
                &quot;default&quot;: {
                  &quot;terms&quot;: {
                    &quot;field&quot;: &quot;cms.cmx_sentiment&quot;
                  },
                  &quot;aggs&quot;: {
                    &quot;the_doc_count&quot;: {
                      &quot;reverse_nested&quot;: {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    &quot;sentiment_only&quot;: {
      &quot;filter&quot;: {
        &quot;bool&quot;: {
          &quot;must&quot;: [
            {
              &quot;nested&quot;: {
                &quot;path&quot;: &quot;cms&quot;,
                &quot;query&quot;: {
                  &quot;term&quot;: {
                    &quot;cms.cli_id&quot;: 2
                  }
                }
              }
            }
          ]
        }
      },
      &quot;aggs&quot;: {
        &quot;sentiment_nested&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;cms&quot;
          },
          &quot;aggs&quot;: {
            &quot;sentiment_for_cli_id&quot;: {
              &quot;filter&quot;: {
                &quot;term&quot;: {
                  &quot;cms.cli_id&quot;: 2
                }
              },
              &quot;aggs&quot;: {
                &quot;sentiment&quot;: {
                  &quot;terms&quot;: {
                    &quot;field&quot;: &quot;cms.cmx_sentiment&quot;
                  },
                  &quot;aggs&quot;: {
                    &quot;the_doc_count&quot;: {
                      &quot;reverse_nested&quot;: {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

####Agg Output:

 &quot;aggregations&quot; : {
    &quot;default_only&quot; : {
      &quot;doc_count&quot; : 1,
      &quot;default_nested&quot; : {
        &quot;doc_count&quot; : 1,
        &quot;sentiment_for_cli_id&quot; : {
          &quot;doc_count&quot; : 1,
          &quot;default&quot; : {
            &quot;doc_count_error_upper_bound&quot; : 0,
            &quot;sum_other_doc_count&quot; : 0,
            &quot;buckets&quot; : [
              {
                &quot;key&quot; : &quot;Positive&quot;,
                &quot;doc_count&quot; : 1,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 1
                }
              }
            ]
          }
        }
      }
    },
    &quot;sentiment_only&quot; : {
      &quot;doc_count&quot; : 3,
      &quot;sentiment_nested&quot; : {
        &quot;doc_count&quot; : 6,
        &quot;sentiment_for_cli_id&quot; : {
          &quot;doc_count&quot; : 3,
          &quot;sentiment&quot; : {
            &quot;doc_count_error_upper_bound&quot; : 0,
            &quot;sum_other_doc_count&quot; : 0,
            &quot;buckets&quot; : [
              {
                &quot;key&quot; : &quot;Neutral&quot;,
                &quot;doc_count&quot; : 2,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 2
                }
              },
              {
                &quot;key&quot; : &quot;Negative&quot;,
                &quot;doc_count&quot; : 1,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 1
                }
              }
            ]
          }
        }
      }
    }
  }

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在嵌套字段上使用的Elasticsearch查询条件

问题

答案1

Query:

Aggregation:

default_only

sentiment_only

Example:

Query:

Agg Output:

如何使用go-elasticsearch获取/更新别名？

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

将历史时间序列数据索引到Elasticsearch数据流 – ILM

Translog durability (Elasticsearch) 翻译为：事务日志持久性（Elasticsearch）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论