如何在嵌套的文本数组中进行部分文本(query_string)匹配?

huangapple go评论62阅读模式
英文:

How to do a partial text (query_string) match on a nested array of texts?

问题

Sure, here's the translation of the provided content:

有没有办法在嵌套字段上执行部分或子词匹配,而该字段是字符串数组?

我已经索引了一些文档,其中包含一个包含文本数组的嵌套字段,例如:

"entities":[
   "0":{
      "id":13,
      "tags":[
         "0":"some other class of tag may be present"
      ]
   }
]

"Entities"字段被索引为“nested”类型,带有“tags”属性。 "tags"是“keyword”类型,具有文本类型和不区分大小写分析器的“lower_case”字段。

使用包含完整值文本的嵌套query_string,我可以得到一个匹配:

{
   "query":{
      "nested":{
         "query":{
            "bool":{
               "must":[
                  {
                     "query_string":{
                        "query":"some other class of tag may be present",
                        "fields":[
                           "entities.tags.lower_case"
                        ]
                     }
                  }
               ]
            }
         },
         "path":"entities"
      }
   }
}

但是当我尝试使用部分匹配(例如“some other”或“class”)时,我无法获得匹配。 Elasticsearch中是否有一种在嵌套数组上执行部分匹配的方法?

编辑:
在Java中完成的映射:

XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( "entities" )
   .field("type", "nested")
   .startObject("properties")
        .startObject( "tags" )
           .field("type", "keyword")
           .startObject("fields")
		          .startObject( "lower_case" )
			          .field("type", "text")
       				  .field("analyzer", "case_insensitive")
		     	  .endObject()
		      .endObject()
		  .endObject()
      .endObject()
.endObject();
英文:

Is there a way to do a partial or subword matching on a nested field that is an array of strings?

I have indexed documents that have a nested field which contains an array of texts, eg:

"entities":[
   "0":{
      "id":13,
      "tags":[
         "0":"some other class of tag may be present"
      ]
   }
]

Entities field is indexed as a "nested" type with "tags" property. "tags" is "keyword" type that has a field "lower_case" of type text and case_insensitive analyzer.

With a nested query_string that contains the full value text I can get a match:

{
   "query":{
      "nested":{
         "query":{
            "bool":{
               "must":[
                  {
                     "query_string":{
                        "query":"some other class of tag may be present",
                        "fields":[
                           "entities.tags.lower_case"
                        ]
                     }
                  }
               ]
            }
         },
         "path":"entities"
      }
   }
}

But when I try to use a partial matching (eg. "some other" or "class") I don't get a match. Is there a way to do partial matching on nested arrays in elasticsearch?

EDIT:
Mapping done in java:

XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( "entities" )
   .field("type", "nested")
   .startObject("properties")
        .startObject( "tags" )
           .field("type", "keyword")
           .startObject("fields")
		          .startObject( "lower_case" )
			          .field("type", "text")
       				  .field("analyzer", "case_insensitive")
		     	  .endObject()
		      .endObject()
		  .endObject()
      .endObject()
.endObject();

答案1

得分: 1

Sure, here's the translated content:

尝试这个:

"settings": {
    "analysis": {
        "analyzer": {
            "case_insensitive": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "lowercase"
                ]
            }
        }
    }
}

您不应使用关键词分词器。

关键词分词器是一种“无操作”分词器,接受任何文本并输出与单个术语相同的文本。

Please note that the code parts are not translated, as requested.

英文:

Try this:

"settings": {
        "analysis": {
            "analyzer": {
                "case_insensitive": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase"
                    ]
                }
            }
        }
    }

You should not use keyword tokenizer.

The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text as a single term

huangapple
  • 本文由 发表于 2023年5月17日 20:44:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272259.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定