2023年5月17日 20:44:36go评论64阅读模式

英文:

How to do a partial text (query_string) match on a nested array of texts?

问题

Sure, here's the translation of the provided content:

有没有办法在嵌套字段上执行部分或子词匹配，而该字段是字符串数组？

我已经索引了一些文档，其中包含一个包含文本数组的嵌套字段，例如：

"entities":[
   "0":{
      "id":13,
      "tags":[
         "0":"some other class of tag may be present"
      ]
   }
]

"Entities"字段被索引为“nested”类型，带有“tags”属性。 "tags"是“keyword”类型，具有文本类型和不区分大小写分析器的“lower_case”字段。

使用包含完整值文本的嵌套query_string，我可以得到一个匹配：

{
   "query":{
      "nested":{
         "query":{
            "bool":{
               "must":[
                  {
                     "query_string":{
                        "query":"some other class of tag may be present",
                        "fields":[
                           "entities.tags.lower_case"
                        ]
                     }
                  }
               ]
            }
         },
         "path":"entities"
      }
   }
}

但是当我尝试使用部分匹配（例如“some other”或“class”）时，我无法获得匹配。 Elasticsearch中是否有一种在嵌套数组上执行部分匹配的方法？

编辑：
在Java中完成的映射：

XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( "entities" )
   .field("type", "nested")
   .startObject("properties")
        .startObject( "tags" )
           .field("type", "keyword")
           .startObject("fields")
		          .startObject( "lower_case" )
			          .field("type", "text")
       				  .field("analyzer", "case_insensitive")
		     	  .endObject()
		      .endObject()
		  .endObject()
      .endObject()
.endObject();

英文:

Is there a way to do a partial or subword matching on a nested field that is an array of strings?

I have indexed documents that have a nested field which contains an array of texts, eg:

&quot;entities&quot;:[
   &quot;0&quot;:{
      &quot;id&quot;:13,
      &quot;tags&quot;:[
         &quot;0&quot;:&quot;some other class of tag may be present&quot;
      ]
   }
]

Entities field is indexed as a "nested" type with "tags" property. "tags" is "keyword" type that has a field "lower_case" of type text and case_insensitive analyzer.

With a nested query_string that contains the full value text I can get a match:

{
   &quot;query&quot;:{
      &quot;nested&quot;:{
         &quot;query&quot;:{
            &quot;bool&quot;:{
               &quot;must&quot;:[
                  {
                     &quot;query_string&quot;:{
                        &quot;query&quot;:&quot;some other class of tag may be present&quot;,
                        &quot;fields&quot;:[
                           &quot;entities.tags.lower_case&quot;
                        ]
                     }
                  }
               ]
            }
         },
         &quot;path&quot;:&quot;entities&quot;
      }
   }
}

But when I try to use a partial matching (eg. "some other" or "class") I don't get a match. Is there a way to do partial matching on nested arrays in elasticsearch?

EDIT:
Mapping done in java:

XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( &quot;entities&quot; )
   .field(&quot;type&quot;, &quot;nested&quot;)
   .startObject(&quot;properties&quot;)
        .startObject( &quot;tags&quot; )
           .field(&quot;type&quot;, &quot;keyword&quot;)
           .startObject(&quot;fields&quot;)
		          .startObject( &quot;lower_case&quot; )
			          .field(&quot;type&quot;, &quot;text&quot;)
       				  .field(&quot;analyzer&quot;, &quot;case_insensitive&quot;)
		     	  .endObject()
		      .endObject()
		  .endObject()
      .endObject()
.endObject();

答案1

得分: 1

Sure, here's the translated content:

尝试这个：

"settings": {
    "analysis": {
        "analyzer": {
            "case_insensitive": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "lowercase"
                ]
            }
        }
    }
}

您不应使用关键词分词器。

关键词分词器是一种“无操作”分词器，接受任何文本并输出与单个术语相同的文本。

Please note that the code parts are not translated, as requested.

英文:

Try this:

&quot;settings&quot;: {
        &quot;analysis&quot;: {
            &quot;analyzer&quot;: {
                &quot;case_insensitive&quot;: {
                    &quot;type&quot;: &quot;custom&quot;,
                    &quot;tokenizer&quot;: &quot;standard&quot;,
                    &quot;filter&quot;: [
                        &quot;lowercase&quot;
                    ]
                }
            }
        }
    }

You should not use keyword tokenizer.

The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text as a single term

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在嵌套的文本数组中进行部分文本（query_string）匹配？

问题

答案1

如何在Elasticsearch中对嵌套的JSON对象/字段进行术语查询？

在Elasticsearch索引中有超过20亿个文档。

ElasticSearch: check how analyzers/tokenizers/filters applied to an index split text into tokens?

嵌套字段类型在 Elasticsearch 中只会作为一个文档计数进行搜索。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论