英文:
Elasticsearch query on a nested field with condition
问题
Elasticsearch v7.0
你好,祝你有美好的一天!
我正在尝试创建一个查询,其中有一个条件:如果一个嵌套字段只有一个元素,那么获取第一个元素;如果一个嵌套字段有两个或更多元素,那么获取匹配的嵌套字段条件。
情境:
我有一个名为 "socialmedia" 的索引,它有一个名为 "cms" 的嵌套字段,用于为该文档设置情感。
"cms" 字段的示例文档如下:
"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]
这个 "cms" 字段默认包含 "cli_id" : 0 作为其第一个元素(这意味着所有客户/用户都可以看到)。但是,不久之后,它会变成这样:
"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    }
]
第二和第三个元素显示,具有 "cli_id" 等于 1 和 2 的客户对该文档进行了情感标记。
现在,我想要创建一个查询,如果登录的客户对特定文档还没有情感标记,那么它将获取具有 "cli_id" : 0 的 "cmx_sentiment"。
但是,如果已登录的客户对根据他的筛选条件提取的文档有情感标记,查询将获取与已登录客户的 cli_id 匹配的 "cmx_sentiment"。
例如:
"具有 cli_id 为 2 的客户将获取上面给定文档的 'Neutral' cmx_sentiment"
"具有 cli_id 为 5 的客户将获取 'Negative' cmx_sentiment,因为他尚未对该文档进行情感标记"
伪代码:
如果文档由客户指定了情感标记,请获取 "cli_id" 等于客户的 ID 的 "cmx_sentiment"
如果文档是新的或客户尚未在该文档上标记情感,请获取具有 "cli_id" == 0 的元素的 "cmx_sentiment"
我需要一个符合上述伪代码条件的查询。
这是我的示例查询:
"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // 我计划在这里创建一个布尔方法,以测试 cli_id 是否等于已登录客户的 ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]
我的查询是否正确?
我提供的查询存在问题,它会将所有元素相加,而不是仅获取一个元素。
上述查询提供了以下情景:
客户的 cli_id 为 2 登录
它检索了 "Neutral" 和 "Negative" 的 cmx_sentiment,而不仅仅是 "Neutral"。
英文:
Elasticsearch v7.0
Hello and good day!
I'm trying to create a query that will have a condition: if a nested field has only 1 element, get that first element, if a nested field has 2 more or elements, get a matching nested field condition
Scenario:
I have an index named  socialmedia  and has a  nested  field named  cms  which places a sentiment for that  document
An example document of the  cms  field looks like this
"_id" : 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    }
]
This  cms  field contains  "cli_id" : 0  by default for its 1st element (this means it is for all the clients/users to see) but sooner or later, it goes like this:
"_id": 1,
"cms" : [
    {
      "cli_id" : 0,
      "cmx_sentiment" : "Negative"
    },
    {
      "cli_id" : 1,
      "cmx_sentiment" : "Positive"
    },
    {
      "cli_id" : 2,
      "cmx_sentiment" : "Neutral"
    },
]
The 2nd and 3rd element shows that the clients with  cli_id  equals to  1  and  2  has made a sentiment for that document.
Now, I want to formulate a query that if the client who logged in has no  sentiment  yet for a specific document, it fetches the  cmx_sentiment  that has the  "cli_id" : 0
BUT , if the client who has logged in has a sentiment for the fetched documents according to his filters, the query will fetch the cmx_sentiment that has the matching cli_id of the logged in client
for example:
the client who has a cli_id of 2, will get the cmx_sentiment of **Neutral** according to the given document above
the client who has a cli_id of 5, will get the cmx_sentiment of **Negative** because he hasn't given a sentiment to the document
PSEUDO CODE :
If a document has a sentiment indicated by the client, get the  cmx_sentiment  of the  cli_id  == to the client's ID
if a document is fresh or the client  HAS NOT  labeled yet a sentiment on that document, get the element's  cmx_sentiment  that has  cli_id  == 0
I'm in need of a query to condition for the pseudo code above
Here's my sample query:
"aggs" => [
    "CMS" => [
        "nested" => [
            "path" => "cms",
        ],
        "aggs" => [
            "FILTER" => [
                "filter" => [
                    "bool" => [
                        "should" => [
                            [
                                "match" => [
                                    "cms.cli_id" => 0
                                ]
                            ],
                            [
                                "bool" => [
                                    "must" => [
                                        [
                                            // I'm planing to create a bool method here to test if cli_id is equalis to the logged-in client's ID
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
                "aggs"=> [
                    "TONALITY"=> [
                        "terms"=> [
                            "field" => "cms.cmx_sentiment"
                        ],
                    ]
                ]
            ]
        ]
    ]
]
Is my query correct?
The problem with the query I have provided, is that it SUMS all the elements, instead of picking one only
The query above provides this scenario:
The client with cli_id 2 logs in
Both the Neutral and Negative cmx_sentiment are being retrieved, instead of the Neutral alone
答案1
得分: 1
以下是您要翻译的内容:
"After the discussion with OP I'm rewriting this answer."
To get the desired result you will have to consider the following to build the query and aggregation:
Query:
This will contain any filter applied by the logged-in user. For the example purpose, I'm using match_all since every document has at least one nested doc against the cms field, i.e., for cli_id: 0
Aggregation:
Here we have to divide the aggregations into two:
- default_only
 - sentiment_only
 
default_only
In this aggregation, we find the count for those documents which don't have nested documents for cli_id: <logged-in client id>. i.e., only those docs which have nested docs for cli_id: 0.
To do this, we follow the steps below:
default_only: Use filter aggregation to get documents that do not have nested documents forcli_id: <logged-in client id>, i.e., usingmust_not=>cli_id: <logged-in client id>default_nested: Add sub-aggregation for nested docs since we need to get the docs against sentiment which is a field of the nested document.sentiment_for_cli_id: Add sub-aggregation todefault_nestedaggregation in order to get sentiment only for the default client, i.e., for cli_id: 0.default: Add this terms sub-aggregation tosentiment_for_cli_idaggregation to get counts against the sentiment. Note that this count is of nested docs, and since you always have only one nested doc per cli_id, therefore this count seems to be the count of docs but it is not.the_doc_count: Add thisreverse_nestedaggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub-aggregation ofdefaultaggregation.
sentiment_only
This aggregation gives a count against each sentiment where cli_id: <logged-in client id> is present. For this, we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:
sentiment_only:must=>cli_id: <logged-in client id>sentiment_nested: same reason as abovesentiment_for_cli_id: same but instead of default, we filter forcli_id: <logged-in client id>sentiment: same asdefaultthe_doc_count: same as above
Example:
PUT socialmedia/_bulk
{"index":{"_id":1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id":2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id":3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id":4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
Query:
GET socialmedia/_search
{
"query": {
"match_all": {}
},
"aggs": {
"default_only": {
"filter": {
"bool": {
"must_not": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"default_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 0
}
},
"aggs": {
"default": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
},
"sentiment_only": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"sentiment_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 2
}
},
"aggs": {
"sentiment": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}
Agg Output:
"aggregations" : {
"default_only" : {
"doc_count" : 1,
"default_nested" : {
"doc_count" : 1,
"sentiment_for_cli_id" : {
"doc_count" : 1,
"default" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Positive",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
},
"sentiment_only" : {
"doc_count" : 3,
"sentiment_nested" : {
"doc_count" : 6,
"sentiment_for_cli_id" : {
"doc_count" : 3,
"sentiment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Neutral",
"doc_count" : 2,
"the_doc_count" : {
"doc_count" : 2
}
},
{
"key" : "Negative",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
}
}
英文:
After the discussion with OP I'm rewriting this answer.
To get the desired result you will have to consider the following to build the query and aggregation:
###Query:
This will contain any filter applied by logged in user. For the example purpose I'm using match_all since every document has atleast one nested doc against cms field i.e. for cli_id: 0
###Aggregation:
Here we have to divide the aggregations into two:
- default_only
 - sentiment_only
 
####default_only
In this aggregation we find count for those document which don't have nested document for cli_id: <logged in client id>. i.e. only those docs which have nested doc for cli_id: 0.
To do this we follow the steps below:
default_onlyUse filter aggregation to get document which does not have nested document forcli_id: <logged in client id>i.e. usingmust_not=>cli_id: <logged in client id>default_nested: Add sub aggregation for nested docs since we need to get the docs against sentiment which is field of nested document.sentiment_for_cli_id: Add sub aggregation todefault_nestedaggregation in order to get sentiment only for default client i.e. for cli_id: 0.default: Add this terms sub aggregation tosentiment_for_cli_idaggregation to get counts against the sentiment. Note that this count is of nested docs and since you always have only one nested doc per cli_id therefore this count seems to be the count of docs but it is not.the_doc_count: Add thisreverse_nestedaggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub aggregation ofdefaultaggregation.
####sentiment_only
This aggregation give count against each sentiment where cli_id: <logged in client id> is present. For this we follow the same approach as we followed for default_only aggregation. But with some tweaks as below:
sentiment_only:must=>cli_id: <logged in client id>sentiment_nested: same reason as abovesentiment_for_cli_id: same but instead of default we filter forcli_id: <logged in client id>sentiment: same asdefaultthe_doc_count: same as above
###Example:
PUT socialmedia/_bulk
{"index":{"_id": 1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id": 2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id": 3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id": 4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
####Query:
GET socialmedia/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "default_only": {
      "filter": {
        "bool": {
          "must_not": [
            {
              "nested": {
                "path": "cms",
                "query": {
                  "term": {
                    "cms.cli_id": 2
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "default_nested": {
          "nested": {
            "path": "cms"
          },
          "aggs": {
            "sentiment_for_cli_id": {
              "filter": {
                "term": {
                  "cms.cli_id": 0
                }
              },
              "aggs": {
                "default": {
                  "terms": {
                    "field": "cms.cmx_sentiment"
                  },
                  "aggs": {
                    "the_doc_count": {
                      "reverse_nested": {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "sentiment_only": {
      "filter": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "cms",
                "query": {
                  "term": {
                    "cms.cli_id": 2
                  }
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "sentiment_nested": {
          "nested": {
            "path": "cms"
          },
          "aggs": {
            "sentiment_for_cli_id": {
              "filter": {
                "term": {
                  "cms.cli_id": 2
                }
              },
              "aggs": {
                "sentiment": {
                  "terms": {
                    "field": "cms.cmx_sentiment"
                  },
                  "aggs": {
                    "the_doc_count": {
                      "reverse_nested": {}
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
####Agg Output:
 "aggregations" : {
    "default_only" : {
      "doc_count" : 1,
      "default_nested" : {
        "doc_count" : 1,
        "sentiment_for_cli_id" : {
          "doc_count" : 1,
          "default" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Positive",
                "doc_count" : 1,
                "the_doc_count" : {
                  "doc_count" : 1
                }
              }
            ]
          }
        }
      }
    },
    "sentiment_only" : {
      "doc_count" : 3,
      "sentiment_nested" : {
        "doc_count" : 6,
        "sentiment_for_cli_id" : {
          "doc_count" : 3,
          "sentiment" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "Neutral",
                "doc_count" : 2,
                "the_doc_count" : {
                  "doc_count" : 2
                }
              },
              {
                "key" : "Negative",
                "doc_count" : 1,
                "the_doc_count" : {
                  "doc_count" : 1
                }
              }
            ]
          }
        }
      }
    }
  }
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论