在嵌套字段上使用的Elasticsearch查询条件

huangapple go评论160阅读模式
英文:

Elasticsearch query on a nested field with condition

问题

Elasticsearch v7.0

你好,祝你有美好的一天!

我正在尝试创建一个查询,其中有一个条件:如果一个嵌套字段只有一个元素,那么获取第一个元素;如果一个嵌套字段有两个或更多元素,那么获取匹配的嵌套字段条件。

情境:

我有一个名为 "socialmedia" 的索引,它有一个名为 "cms" 的嵌套字段,用于为该文档设置情感。

"cms" 字段的示例文档如下:

"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]

这个 "cms" 字段默认包含 "cli_id" : 0 作为其第一个元素(这意味着所有客户/用户都可以看到)。但是,不久之后,它会变成这样:

"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    }
]

第二和第三个元素显示,具有 "cli_id" 等于 1 和 2 的客户对该文档进行了情感标记。

现在,我想要创建一个查询,如果登录的客户对特定文档还没有情感标记,那么它将获取具有 "cli_id" : 0 的 "cmx_sentiment"。

但是,如果已登录的客户对根据他的筛选条件提取的文档有情感标记,查询将获取与已登录客户的 cli_id 匹配的 "cmx_sentiment"。

例如:
"具有 cli_id 为 2 的客户将获取上面给定文档的 'Neutral' cmx_sentiment"
"具有 cli_id 为 5 的客户将获取 'Negative' cmx_sentiment,因为他尚未对该文档进行情感标记"

伪代码:

如果文档由客户指定了情感标记,请获取 "cli_id" 等于客户的 ID 的 "cmx_sentiment"
如果文档是新的或客户尚未在该文档上标记情感,请获取具有 "cli_id" == 0 的元素的 "cmx_sentiment"

我需要一个符合上述伪代码条件的查询。

这是我的示例查询:

"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // 我计划在这里创建一个布尔方法,以测试 cli_id 是否等于已登录客户的 ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]

我的查询是否正确?

我提供的查询存在问题,它会将所有元素相加,而不是仅获取一个元素。

上述查询提供了以下情景:

客户的 cli_id 为 2 登录

它检索了 "Neutral" 和 "Negative" 的 cmx_sentiment,而不仅仅是 "Neutral"。

英文:

Elasticsearch v7.0

Hello and good day!

I'm trying to create a query that will have a condition: if a nested field has only 1 element, get that first element, if a nested field has 2 more or elements, get a matching nested field condition

Scenario:

I have an index named socialmedia and has a nested field named cms which places a sentiment for that document

An example document of the cms field looks like this

"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]

This cms field contains "cli_id" : 0 by default for its 1st element (this means it is for all the clients/users to see) but sooner or later, it goes like this:

"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    },
]

The 2nd and 3rd element shows that the clients with cli_id equals to 1 and 2 has made a sentiment for that document.

Now, I want to formulate a query that if the client who logged in has no sentiment yet for a specific document, it fetches the cmx_sentiment that has the "cli_id" : 0

BUT , if the client who has logged in has a sentiment for the fetched documents according to his filters, the query will fetch the cmx_sentiment that has the matching cli_id of the logged in client

for example:
the client who has a cli_id of 2, will get the cmx_sentiment of **Neutral** according to the given document above

the client who has a cli_id of 5, will get the cmx_sentiment of **Negative** because he hasn't given a sentiment to the document

PSEUDO CODE :

If a document has a sentiment indicated by the client, get the cmx_sentiment of the cli_id == to the client's ID

if a document is fresh or the client HAS NOT labeled yet a sentiment on that document, get the element's cmx_sentiment that has cli_id == 0

I'm in need of a query to condition for the pseudo code above

Here's my sample query:

"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // I'm planing to create a bool method here to test if cli_id is equalis to the logged-in client's ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]

Is my query correct?

The problem with the query I have provided, is that it SUMS all the elements, instead of picking one only

The query above provides this scenario:

The client with cli_id 2 logs in

Both the Neutral and Negative cmx_sentiment are being retrieved, instead of the Neutral alone

答案1

得分: 1

以下是您要翻译的内容:

"After the discussion with OP I'm rewriting this answer."

To get the desired result you will have to consider the following to build the query and aggregation:

Query:

This will contain any filter applied by the logged-in user. For the example purpose, I'm using match_all since every document has at least one nested doc against the cms field, i.e., for cli_id: 0

Aggregation:

Here we have to divide the aggregations into two:

  1. default_only
  2. sentiment_only

default_only

In this aggregation, we find the count for those documents which don't have nested documents for cli_id: <logged-in client id>. i.e., only those docs which have nested docs for cli_id: 0.
To do this, we follow the steps below:

  1. default_only: Use filter aggregation to get documents that do not have nested documents for cli_id: <logged-in client id>, i.e., using must_not => cli_id: <logged-in client id>
  2. default_nested: Add sub-aggregation for nested docs since we need to get the docs against sentiment which is a field of the nested document.
  3. sentiment_for_cli_id: Add sub-aggregation to default_nested aggregation in order to get sentiment only for the default client, i.e., for cli_id: 0.
  4. default: Add this terms sub-aggregation to sentiment_for_cli_id aggregation to get counts against the sentiment. Note that this count is of nested docs, and since you always have only one nested doc per cli_id, therefore this count seems to be the count of docs but it is not.
  5. the_doc_count: Add this reverse_nested aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub-aggregation of default aggregation.

sentiment_only

This aggregation gives a count against each sentiment where cli_id: <logged-in client id> is present. For this, we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:

  1. sentiment_only: must => cli_id: <logged-in client id>
  2. sentiment_nested: same reason as above
  3. sentiment_for_cli_id: same but instead of default, we filter for cli_id: <logged-in client id>
  4. sentiment: same as default
  5. the_doc_count: same as above

Example:

PUT socialmedia/_bulk
{"index":{"_id":1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id":2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id":3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id":4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}

Query:

GET socialmedia/_search
{
"query": {
"match_all": {}
},
"aggs": {
"default_only": {
"filter": {
"bool": {
"must_not": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"default_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 0
}
},
"aggs": {
"default": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
},
"sentiment_only": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"sentiment_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 2
}
},
"aggs": {
"sentiment": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}

Agg Output:

"aggregations" : {
"default_only" : {
"doc_count" : 1,
"default_nested" : {
"doc_count" : 1,
"sentiment_for_cli_id" : {
"doc_count" : 1,
"default" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Positive",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
},
"sentiment_only" : {
"doc_count" : 3,
"sentiment_nested" : {
"doc_count" : 6,
"sentiment_for_cli_id" : {
"doc_count" : 3,
"sentiment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Neutral",
"doc_count" : 2,
"the_doc_count" : {
"doc_count" : 2
}
},
{
"key" : "Negative",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
}
}

英文:

After the discussion with OP I'm rewriting this answer.

To get the desired result you will have to consider the following to build the query and aggregation:

###Query:
This will contain any filter applied by logged in user. For the example purpose I'm using match_all since every document has atleast one nested doc against cms field i.e. for cli_id: 0

###Aggregation:
Here we have to divide the aggregations into two:

  1. default_only
  2. sentiment_only

####default_only
In this aggregation we find count for those document which don't have nested document for cli_id: &lt;logged in client id&gt;. i.e. only those docs which have nested doc for cli_id: 0.
To do this we follow the steps below:

  1. default_only Use filter aggregation to get document which does not have nested document for cli_id: &lt;logged in client id&gt; i.e. using must_not => cli_id: &lt;logged in client id&gt;
  2. default_nested : Add sub aggregation for nested docs since we need to get the docs against sentiment which is field of nested document.
  3. sentiment_for_cli_id : Add sub aggregation to default_nested aggregation in order to get sentiment only for default client i.e. for cli_id: 0.
  4. default : Add this terms sub aggregation to sentiment_for_cli_id aggregation to get counts against the sentiment. Note that this count is of nested docs and since you always have only one nested doc per cli_id therefore this count seems to be the count of docs but it is not.
  5. the_doc_count: Add this reverse_nested aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub aggregation of default aggregation.

####sentiment_only
This aggregation give count against each sentiment where cli_id: &lt;logged in client id&gt; is present. For this we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:

  1. sentiment_only : must => cli_id: &lt;logged in client id&gt;
  2. sentiment_nested : same reason as above
  3. sentiment_for_cli_id: same but instead of default we filter for cli_id: &lt;logged in client id&gt;
  4. sentiment: same as default
  5. the_doc_count: same as above

###Example:

PUT socialmedia/_bulk
{&quot;index&quot;:{&quot;_id&quot;: 1}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 2}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Neutral&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 3}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Negative&quot;}]}
{&quot;index&quot;:{&quot;_id&quot;: 4}}
{&quot;cms&quot;:[{&quot;cli_id&quot;:0,&quot;cmx_sentiment&quot;:&quot;Positive&quot;},{&quot;cli_id&quot;:2,&quot;cmx_sentiment&quot;:&quot;Neutral&quot;}]}

####Query:

GET socialmedia/_search
{
  &quot;query&quot;: {
    &quot;match_all&quot;: {}
  },
  &quot;aggs&quot;: {
    &quot;default_only&quot;: {
      &quot;filter&quot;: {
        &quot;bool&quot;: {
          &quot;must_not&quot;: [
            {
              &quot;nested&quot;: {
                &quot;path&quot;: &quot;cms&quot;,
                &quot;query&quot;: {
                  &quot;term&quot;: {
                    &quot;cms.cli_id&quot;: 2
                  }
                }
              }
            }
          ]
        }
      },
      &quot;aggs&quot;: {
        &quot;default_nested&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;cms&quot;
          },
          &quot;aggs&quot;: {
            &quot;sentiment_for_cli_id&quot;: {
              &quot;filter&quot;: {
                &quot;term&quot;: {
                  &quot;cms.cli_id&quot;: 0
                }
              },
              &quot;aggs&quot;: {
                &quot;default&quot;: {
                  &quot;terms&quot;: {
                    &quot;field&quot;: &quot;cms.cmx_sentiment&quot;
                  },
                  &quot;aggs&quot;: {
                    &quot;the_doc_count&quot;: {
                      &quot;reverse_nested&quot;: {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    &quot;sentiment_only&quot;: {
      &quot;filter&quot;: {
        &quot;bool&quot;: {
          &quot;must&quot;: [
            {
              &quot;nested&quot;: {
                &quot;path&quot;: &quot;cms&quot;,
                &quot;query&quot;: {
                  &quot;term&quot;: {
                    &quot;cms.cli_id&quot;: 2
                  }
                }
              }
            }
          ]
        }
      },
      &quot;aggs&quot;: {
        &quot;sentiment_nested&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;cms&quot;
          },
          &quot;aggs&quot;: {
            &quot;sentiment_for_cli_id&quot;: {
              &quot;filter&quot;: {
                &quot;term&quot;: {
                  &quot;cms.cli_id&quot;: 2
                }
              },
              &quot;aggs&quot;: {
                &quot;sentiment&quot;: {
                  &quot;terms&quot;: {
                    &quot;field&quot;: &quot;cms.cmx_sentiment&quot;
                  },
                  &quot;aggs&quot;: {
                    &quot;the_doc_count&quot;: {
                      &quot;reverse_nested&quot;: {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

####Agg Output:

 &quot;aggregations&quot; : {
    &quot;default_only&quot; : {
      &quot;doc_count&quot; : 1,
      &quot;default_nested&quot; : {
        &quot;doc_count&quot; : 1,
        &quot;sentiment_for_cli_id&quot; : {
          &quot;doc_count&quot; : 1,
          &quot;default&quot; : {
            &quot;doc_count_error_upper_bound&quot; : 0,
            &quot;sum_other_doc_count&quot; : 0,
            &quot;buckets&quot; : [
              {
                &quot;key&quot; : &quot;Positive&quot;,
                &quot;doc_count&quot; : 1,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 1
                }
              }
            ]
          }
        }
      }
    },
    &quot;sentiment_only&quot; : {
      &quot;doc_count&quot; : 3,
      &quot;sentiment_nested&quot; : {
        &quot;doc_count&quot; : 6,
        &quot;sentiment_for_cli_id&quot; : {
          &quot;doc_count&quot; : 3,
          &quot;sentiment&quot; : {
            &quot;doc_count_error_upper_bound&quot; : 0,
            &quot;sum_other_doc_count&quot; : 0,
            &quot;buckets&quot; : [
              {
                &quot;key&quot; : &quot;Neutral&quot;,
                &quot;doc_count&quot; : 2,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 2
                }
              },
              {
                &quot;key&quot; : &quot;Negative&quot;,
                &quot;doc_count&quot; : 1,
                &quot;the_doc_count&quot; : {
                  &quot;doc_count&quot; : 1
                }
              }
            ]
          }
        }
      }
    }
  }

huangapple
  • 本文由 发表于 2020年1月3日 14:28:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574104.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定