筛选器在Elasticsearch中的表现不如预期。

huangapple go评论62阅读模式
英文:

Filter not working as expected for Elastic search

问题

我的Elasticsearch文档如下所示:

{
    "_index": "terms",
    "_type": "_doc",
    "_id": "30028861",
    "_version": 36,
    "_seq_no": 338988,
    "_primary_term": 6,
    "found": true,
    "_source": {
        "title": {
            "value": "信用卡号",
            "is_changed": false
        },
        "node_id": 30028861,
        "domain_id": 30028838,

        "node_name": "信用卡号",
        "parent_id": 30028839,
        "term_name": "信用卡号",
        "account_id": 30028807,

        "published_status": "已发布",
        "published_status_id": "PUB",
        "node_sub_type": "TRM"
    }
}

我正在尝试添加过滤子句并使用must_not条件。

我的查询如下:

{
   "from": 0,
   "size": 10,
   "track_total_hits": true,
   "query": {
      "bool": {
         "must": [
            {
               "match": {
                  "account_id": 30028807
               }
            }
         ],
         "filter": [
            {
               "bool": {
                  "should": [
                     {
                        "terms": {
                           "domain_id": [
                              30119066,
                              30123338
                           ]
                        }
                     },
                     {
                        "terms": {
                           "parent_id": [
                              30028839
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "bool": {
                  "must_not": [
                     {
                        "terms": {
                           "published_status_id": [
                              "PUB"
                           ]
                        }
                     },
                     {
                        "terms": {
                           "published_status": [
                              "已发布"
                           ]
                        }
                     }
                  ]
               }
            }
         ],
         "must_not": {
            "terms": {
               "published_status.raw": [
                  "已禁用"
               ]
            }
         }
      }
   },
   "_source": [
      "node_type",
      "node_id",
      "parent_name",
      "table_name"
   ]
}

因此,如在must_not中提到的,如果published_status_id为PUB且published_status为已发布,文档不应返回,但它返回了所有包含这两者的文档。使用过滤子句的初衷是当domain_id和parent_id存在时,且published_status_id不是PUB且published_status不是已发布时才返回文档,但它返回了状态为PUB的文档。

英文:

My Elastic search document look like this :

{
    "_index": "terms",
    "_type": "_doc",
    "_id": "30028861",
    "_version": 36,
    "_seq_no": 338988,
    "_primary_term": 6,
    "found": true,
    "_source": {
        "title": {
            "value": "Credit Card Number",
            "is_changed": false
        },
        "node_id": 30028861,
        "domain_id": 30028838,

        "node_name": "Credit Card Number",
        "parent_id": 30028839,
        "term_name": "Credit Card Number",
        "account_id": 30028807,
       
        "published_status": "Published",
        "published_status_id": "PUB",
        "node_sub_type": "TRM",
    }
}

And I am trying to add Filter clause with must_not condition

My query is :

{
   "from":0,
   "size":10,
   "track_total_hits":true,
   "query":{
      "bool":{
         "must":[
            {
               "match":{
                  "account_id":30028807
               }
            }
         ],
         "filter":[
            {
               "bool":{
                  "should":[
                     {
                        "terms":{
                           "domain_id":[
                              30119066,
                              30123338
                           ]
                        }
                     },
                     {
                        "terms":{
                           "parent_id":[
                              30028839
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "bool":{
                  "must_not":[
                     {
                        "terms":{
                           "published_status_id":[
                              "PUB"
                           ]
                        }
                     },
                     {
                        "terms":{
                           "published_status":[
                              "Published"
                           ]
                        }
                     }
                  ]
               }
            }
         ],
         "must_not":{
            "terms":{
               "published_status.raw":[
                  "Disabled"
               ]
            }
         }
      }
   },
   "_source":[
      "node_type",
      "node_id",
      "parent_name",
      "table_name"
   ]
}

So As In must_not mentioned that if published_status_id in PUB and published_status in Published document should not return but it is returning all docs with containing both . whole intention to use filter clause is either domain_id and parent_id found and published_status_id Not in PUB and published_status Not in Published. but it's returning status with PUB

答案1

得分: 1

我看到你有published_status.raw,这可能是一个关键字,因此,published_status可能是一个文本字段。基于此,must_not中的子句可能应该是:

add this
{
    "terms": {
        "published_status.raw": [
            "Published"
        ]
    }
}

并且确保检查published_status_id,它可能存在相同的问题,你可能需要使用published_status_id.raw以进行精确的terms匹配。

英文:

I see you have published_status.raw which is probably a keyword, and thus, published_status is probably a text field. In light of this, the clause in the must_not should probably be

                                       add this
                 {                        |
                    "terms":{             v
                       "published_status.raw":[
                          "Published"
                       ]
                    }
                 }

And also make sure to check published_status_id which might suffer from the same issue, you might need to use published_status_id.raw in order to make an exact terms match.

huangapple
  • 本文由 发表于 2023年6月22日 05:39:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76527327.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定