英文:
Elasticsearch query on a nested field with condition
问题
Elasticsearch v7.0
你好,祝你有美好的一天!
我正在尝试创建一个查询,其中有一个条件:如果一个嵌套字段只有一个元素,那么获取第一个元素;如果一个嵌套字段有两个或更多元素,那么获取匹配的嵌套字段条件。
情境:
我有一个名为 "socialmedia" 的索引,它有一个名为 "cms" 的嵌套字段,用于为该文档设置情感。
"cms" 字段的示例文档如下:
"_id" : 1,
"cms" : [
{
"cli_id" : 0,
"cmx_sentiment" : "Negative"
}
]
这个 "cms" 字段默认包含 "cli_id" : 0 作为其第一个元素(这意味着所有客户/用户都可以看到)。但是,不久之后,它会变成这样:
"_id": 1,
"cms" : [
{
"cli_id" : 0,
"cmx_sentiment" : "Negative"
},
{
"cli_id" : 1,
"cmx_sentiment" : "Positive"
},
{
"cli_id" : 2,
"cmx_sentiment" : "Neutral"
}
]
第二和第三个元素显示,具有 "cli_id" 等于 1 和 2 的客户对该文档进行了情感标记。
现在,我想要创建一个查询,如果登录的客户对特定文档还没有情感标记,那么它将获取具有 "cli_id" : 0 的 "cmx_sentiment"。
但是,如果已登录的客户对根据他的筛选条件提取的文档有情感标记,查询将获取与已登录客户的 cli_id 匹配的 "cmx_sentiment"。
例如:
"具有 cli_id 为 2 的客户将获取上面给定文档的 'Neutral' cmx_sentiment"
"具有 cli_id 为 5 的客户将获取 'Negative' cmx_sentiment,因为他尚未对该文档进行情感标记"
伪代码:
如果文档由客户指定了情感标记,请获取 "cli_id" 等于客户的 ID 的 "cmx_sentiment"
如果文档是新的或客户尚未在该文档上标记情感,请获取具有 "cli_id" == 0 的元素的 "cmx_sentiment"
我需要一个符合上述伪代码条件的查询。
这是我的示例查询:
"aggs" => [
"CMS" => [
"nested" => [
"path" => "cms",
],
"aggs" => [
"FILTER" => [
"filter" => [
"bool" => [
"should" => [
[
"match" => [
"cms.cli_id" => 0
]
],
[
"bool" => [
"must" => [
[
// 我计划在这里创建一个布尔方法,以测试 cli_id 是否等于已登录客户的 ID
]
]
]
]
]
]
],
"aggs"=> [
"TONALITY"=> [
"terms"=> [
"field" => "cms.cmx_sentiment"
],
]
]
]
]
]
]
我的查询是否正确?
我提供的查询存在问题,它会将所有元素相加,而不是仅获取一个元素。
上述查询提供了以下情景:
客户的 cli_id 为 2 登录
它检索了 "Neutral" 和 "Negative" 的 cmx_sentiment,而不仅仅是 "Neutral"。
英文:
Elasticsearch v7.0
Hello and good day!
I'm trying to create a query that will have a condition: if a nested field has only 1 element, get that first element, if a nested field has 2 more or elements, get a matching nested field condition
Scenario:
I have an index named socialmedia
and has a nested field named cms
which places a sentiment for that document
An example document of the cms
field looks like this
"_id" : 1,
"cms" : [
{
"cli_id" : 0,
"cmx_sentiment" : "Negative"
}
]
This cms
field contains "cli_id" : 0
by default for its 1st element (this means it is for all the clients/users to see) but sooner or later, it goes like this:
"_id": 1,
"cms" : [
{
"cli_id" : 0,
"cmx_sentiment" : "Negative"
},
{
"cli_id" : 1,
"cmx_sentiment" : "Positive"
},
{
"cli_id" : 2,
"cmx_sentiment" : "Neutral"
},
]
The 2nd and 3rd element shows that the clients with cli_id
equals to 1 and 2 has made a sentiment for that document.
Now, I want to formulate a query that if the client who logged in has no sentiment yet for a specific document, it fetches the cmx_sentiment that has the "cli_id" : 0
BUT , if the client who has logged in has a sentiment for the fetched documents according to his filters, the query will fetch the cmx_sentiment that has the matching cli_id of the logged in client
for example:
the client who has a cli_id of 2, will get the cmx_sentiment of **Neutral** according to the given document above
the client who has a cli_id of 5, will get the cmx_sentiment of **Negative** because he hasn't given a sentiment to the document
PSEUDO CODE :
If a document has a sentiment indicated by the client, get the cmx_sentiment
of the cli_id
== to the client's ID
if a document is fresh or the client HAS NOT labeled yet a sentiment on that document, get the element's cmx_sentiment
that has cli_id
== 0
I'm in need of a query to condition for the pseudo code above
Here's my sample query:
"aggs" => [
"CMS" => [
"nested" => [
"path" => "cms",
],
"aggs" => [
"FILTER" => [
"filter" => [
"bool" => [
"should" => [
[
"match" => [
"cms.cli_id" => 0
]
],
[
"bool" => [
"must" => [
[
// I'm planing to create a bool method here to test if cli_id is equalis to the logged-in client's ID
]
]
]
]
]
]
],
"aggs"=> [
"TONALITY"=> [
"terms"=> [
"field" => "cms.cmx_sentiment"
],
]
]
]
]
]
]
Is my query correct?
The problem with the query I have provided, is that it SUMS all the elements, instead of picking one only
The query above provides this scenario:
The client with cli_id 2 logs in
Both the Neutral and Negative cmx_sentiment
are being retrieved, instead of the Neutral alone
答案1
得分: 1
以下是您要翻译的内容:
"After the discussion with OP I'm rewriting this answer."
To get the desired result you will have to consider the following to build the query and aggregation:
Query:
This will contain any filter applied by the logged-in user. For the example purpose, I'm using match_all
since every document has at least one nested doc against the cms
field, i.e., for cli_id: 0
Aggregation:
Here we have to divide the aggregations into two:
- default_only
- sentiment_only
default_only
In this aggregation, we find the count for those documents which don't have nested documents for cli_id: <logged-in client id>
. i.e., only those docs which have nested docs for cli_id: 0
.
To do this, we follow the steps below:
default_only
: Use filter aggregation to get documents that do not have nested documents forcli_id: <logged-in client id>
, i.e., usingmust_not
=>cli_id: <logged-in client id>
default_nested
: Add sub-aggregation for nested docs since we need to get the docs against sentiment which is a field of the nested document.sentiment_for_cli_id
: Add sub-aggregation todefault_nested
aggregation in order to get sentiment only for the default client, i.e., for cli_id: 0.default
: Add this terms sub-aggregation tosentiment_for_cli_id
aggregation to get counts against the sentiment. Note that this count is of nested docs, and since you always have only one nested doc per cli_id, therefore this count seems to be the count of docs but it is not.the_doc_count
: Add thisreverse_nested
aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub-aggregation ofdefault
aggregation.
sentiment_only
This aggregation gives a count against each sentiment where cli_id: <logged-in client id>
is present. For this, we follow the same approach as we followed for default_only
aggregation. But with some tweaks as below:
sentiment_only
:must
=>cli_id: <logged-in client id>
sentiment_nested
: same reason as abovesentiment_for_cli_id
: same but instead of default, we filter forcli_id: <logged-in client id>
sentiment
: same asdefault
the_doc_count
: same as above
Example:
PUT socialmedia/_bulk
{"index":{"_id":1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id":2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id":3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id":4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
Query:
GET socialmedia/_search
{
"query": {
"match_all": {}
},
"aggs": {
"default_only": {
"filter": {
"bool": {
"must_not": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"default_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 0
}
},
"aggs": {
"default": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
},
"sentiment_only": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
]
}
},
"aggs": {
"sentiment_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 2
}
},
"aggs": {
"sentiment": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}
Agg Output:
"aggregations" : {
"default_only" : {
"doc_count" : 1,
"default_nested" : {
"doc_count" : 1,
"sentiment_for_cli_id" : {
"doc_count" : 1,
"default" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Positive",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
},
"sentiment_only" : {
"doc_count" : 3,
"sentiment_nested" : {
"doc_count" : 6,
"sentiment_for_cli_id" : {
"doc_count" : 3,
"sentiment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Neutral",
"doc_count" : 2,
"the_doc_count" : {
"doc_count" : 2
}
},
{
"key" : "Negative",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
}
}
英文:
After the discussion with OP I'm rewriting this answer.
To get the desired result you will have to consider the following to build the query and aggregation:
###Query:
This will contain any filter applied by logged in user. For the example purpose I'm using match_all
since every document has atleast one nested doc against cms
field i.e. for cli_id: 0
###Aggregation:
Here we have to divide the aggregations into two:
- default_only
- sentiment_only
####default_only
In this aggregation we find count for those document which don't have nested document for cli_id: <logged in client id>
. i.e. only those docs which have nested doc for cli_id: 0
.
To do this we follow the steps below:
default_only
Use filter aggregation to get document which does not have nested document forcli_id: <logged in client id>
i.e. usingmust_not
=>cli_id: <logged in client id>
default_nested
: Add sub aggregation for nested docs since we need to get the docs against sentiment which is field of nested document.sentiment_for_cli_id
: Add sub aggregation todefault_nested
aggregation in order to get sentiment only for default client i.e. for cli_id: 0.default
: Add this terms sub aggregation tosentiment_for_cli_id
aggregation to get counts against the sentiment. Note that this count is of nested docs and since you always have only one nested doc per cli_id therefore this count seems to be the count of docs but it is not.the_doc_count
: Add thisreverse_nested
aggregation to get out of nested doc aggs and the count of parent docs. We add this as the sub aggregation ofdefault
aggregation.
####sentiment_only
This aggregation give count against each sentiment where cli_id: <logged in client id>
is present. For this we follow the same approach as we followed for default_only
aggregation. But with some tweaks as below:
sentiment_only
:must
=>cli_id: <logged in client id>
sentiment_nested
: same reason as abovesentiment_for_cli_id
: same but instead of default we filter forcli_id: <logged in client id>
sentiment
: same asdefault
the_doc_count
: same as above
###Example:
PUT socialmedia/_bulk
{"index":{"_id": 1}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"}]}
{"index":{"_id": 2}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
{"index":{"_id": 3}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Negative"}]}
{"index":{"_id": 4}}
{"cms":[{"cli_id":0,"cmx_sentiment":"Positive"},{"cli_id":2,"cmx_sentiment":"Neutral"}]}
####Query:
GET socialmedia/_search
{
"query": {
"match_all": {}
},
"aggs": {
"default_only": {
"filter": {
"bool": {
"must_not": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
}
]
}
},
"aggs": {
"default_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 0
}
},
"aggs": {
"default": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
},
"sentiment_only": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "cms",
"query": {
"term": {
"cms.cli_id": 2
}
}
}
}
]
}
},
"aggs": {
"sentiment_nested": {
"nested": {
"path": "cms"
},
"aggs": {
"sentiment_for_cli_id": {
"filter": {
"term": {
"cms.cli_id": 2
}
},
"aggs": {
"sentiment": {
"terms": {
"field": "cms.cmx_sentiment"
},
"aggs": {
"the_doc_count": {
"reverse_nested": {}
}
}
}
}
}
}
}
}
}
}
}
####Agg Output:
"aggregations" : {
"default_only" : {
"doc_count" : 1,
"default_nested" : {
"doc_count" : 1,
"sentiment_for_cli_id" : {
"doc_count" : 1,
"default" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Positive",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
},
"sentiment_only" : {
"doc_count" : 3,
"sentiment_nested" : {
"doc_count" : 6,
"sentiment_for_cli_id" : {
"doc_count" : 3,
"sentiment" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Neutral",
"doc_count" : 2,
"the_doc_count" : {
"doc_count" : 2
}
},
{
"key" : "Negative",
"doc_count" : 1,
"the_doc_count" : {
"doc_count" : 1
}
}
]
}
}
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论