英文:
How to do a partial text (query_string) match on a nested array of texts?
问题
Sure, here's the translation of the provided content:
有没有办法在嵌套字段上执行部分或子词匹配,而该字段是字符串数组?
我已经索引了一些文档,其中包含一个包含文本数组的嵌套字段,例如:
"entities":[
"0":{
"id":13,
"tags":[
"0":"some other class of tag may be present"
]
}
]
"Entities"字段被索引为“nested”类型,带有“tags”属性。 "tags"是“keyword”类型,具有文本类型和不区分大小写分析器的“lower_case”字段。
使用包含完整值文本的嵌套query_string,我可以得到一个匹配:
{
"query":{
"nested":{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"some other class of tag may be present",
"fields":[
"entities.tags.lower_case"
]
}
}
]
}
},
"path":"entities"
}
}
}
但是当我尝试使用部分匹配(例如“some other”或“class”)时,我无法获得匹配。 Elasticsearch中是否有一种在嵌套数组上执行部分匹配的方法?
编辑:
在Java中完成的映射:
XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( "entities" )
.field("type", "nested")
.startObject("properties")
.startObject( "tags" )
.field("type", "keyword")
.startObject("fields")
.startObject( "lower_case" )
.field("type", "text")
.field("analyzer", "case_insensitive")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
英文:
Is there a way to do a partial or subword matching on a nested field that is an array of strings?
I have indexed documents that have a nested field which contains an array of texts, eg:
"entities":[
"0":{
"id":13,
"tags":[
"0":"some other class of tag may be present"
]
}
]
Entities field is indexed as a "nested" type with "tags" property. "tags" is "keyword" type that has a field "lower_case" of type text and case_insensitive analyzer.
With a nested query_string that contains the full value text I can get a match:
{
"query":{
"nested":{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"some other class of tag may be present",
"fields":[
"entities.tags.lower_case"
]
}
}
]
}
},
"path":"entities"
}
}
}
But when I try to use a partial matching (eg. "some other" or "class") I don't get a match. Is there a way to do partial matching on nested arrays in elasticsearch?
EDIT:
Mapping done in java:
XContentBuilder docProperties = XContentFactory.jsonBuilder();
docProperties
.startObject( "entities" )
.field("type", "nested")
.startObject("properties")
.startObject( "tags" )
.field("type", "keyword")
.startObject("fields")
.startObject( "lower_case" )
.field("type", "text")
.field("analyzer", "case_insensitive")
.endObject()
.endObject()
.endObject()
.endObject()
.endObject();
答案1
得分: 1
Sure, here's the translated content:
尝试这个:
"settings": {
"analysis": {
"analyzer": {
"case_insensitive": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
您不应使用关键词分词器。
关键词分词器是一种“无操作”分词器,接受任何文本并输出与单个术语相同的文本。
Please note that the code parts are not translated, as requested.
英文:
Try this:
"settings": {
"analysis": {
"analyzer": {
"case_insensitive": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase"
]
}
}
}
}
You should not use keyword tokenizer.
The keyword tokenizer is a “noop” tokenizer that accepts whatever text it is given and outputs the exact same text as a single term
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论