在Elasticsearch中的子字符串匹配与条件。

huangapple go评论46阅读模式
英文:

Substring match in elasticsearch with conditions

问题

以下是您要翻译的部分:

我正在尝试执行一个Elasticsearch查询,我应该得到所有餐馆的子串中包含“pizz”的餐馆名称,但不包含“pizza”和“pizzeria”。

我为此目的编写的查询如下:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "RestaurantName": {
              "value": "*pizz*"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "RestaurantName": "pizza"
          }
        },
        {
          "match": {
            "RestaurantName": "pizzeria"
          }
        }
      ]
    }
  }
}

这个查询匹配了类似“Instapizza”的字段,这是错误的。它应该匹配任何组合或大写的情况,如“Fozzie's Pizzaiolo”、“PizzaVito”、“Pizzalicious”。我如何修改查询以避免匹配不需要的字段?对此有任何帮助将非常棒。

英文:

I am trying to perform an Elasticsearch query, where I am supposed to get all restaurants which contain the substring 'pizz' in the restaurant name but do not contain neither 'pizza' nor 'pizzeria'.

The query I wrote for this purpose is this:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "RestaurantName": {
              "value": "*pizz*"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "RestaurantName": "pizza"
          }
        },
        {
          "match": {
            "RestaurantName": "pizzeria"
          }
        }
      ]
    }
  }
}

This query matches fields like Instapizza which is wrong. It should match anything combined or uppercase cases like: Fozzie's Pizzaiolo, PizzaVito, Pizzalicious. How can I fix the query to lose the match for unwanted fields? Any help with this would be really great.

答案1

得分: 2

以下是翻译好的内容:

当您将'RestaurantName'索引为文本字段时,“标准”分析器包括小写过滤器,“小写”标记过滤器使字段不区分大小写,这意味着Lucene中的所有标记都是小写。

首先,您应该为RestaurantName字段添加额外的关键字类型。

{
"mappings": {
"properties": {
"RestaurantName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}

使用通配符进行搜索,

{
"query": {
"bool": {
"must": [
{
"wildcard": {
"RestaurantName.keyword": {
"value": "Pizz"
}
}
}
],
"must_not": [
{
"match": {
"RestaurantName": "pizza"
}
},
{
"match": {
"RestaurantName": "pizzeria"
}
}
]
}
}
}

结果是,

{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "pizza",
"_type": "_doc",
"_id": "1L6ob4cB6Rdc8HbDY8vi",
"_score": 1.0,
"_source": {
"RestaurantName": "Fozzie's Pizzaiolo"
}
},
{
"_index": "pizza",
"_type": "_doc",
"_id": "1b6ob4cB6Rdc8HbDg8tA",
"_score": 1.0,
"_source": {
"RestaurantName": "PizzaVito"
}
},
{
"_index": "pizza",
"_type": "_doc",
"_id": "1r6ob4cB6Rdc8HbDmMuJ",
"_score": 1.0,
"_source": {
"RestaurantName": "Pizzalicious"
}
}
}
}
}

英文:

When you index 'RestaurantName' as a text field, the "Standard" analyzer includes the lowercase filter, "lowercase" token filter makes fields case-insensitive, which means all tokens in lucene are lowercase.

first, you should add an extra keyword type to RestaurantName field.

{
    "mappings": {
        "properties": {
            "RestaurantName": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}

search with wildcard,

{
    "query": {
        "bool": {
            "must": [
                {
                    "wildcard": {
                        "RestaurantName.keyword": {
                            "value": "*Pizz*"
                        }
                    }
                }
            ],
            "must_not": [
                {
                    "match": {
                        "RestaurantName": "pizza"
                    }
                },
                {
                    "match": {
                        "RestaurantName": "pizzeria"
                    }
                }
            ]
        }
    }
}

the result is,

{
    "took": 8,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "pizza",
                "_type": "_doc",
                "_id": "1L6ob4cB6Rdc8HbDY8vi",
                "_score": 1.0,
                "_source": {
                    "RestaurantName": "Fozzie's Pizzaiolo"
                }
            },
            {
                "_index": "pizza",
                "_type": "_doc",
                "_id": "1b6ob4cB6Rdc8HbDg8tA",
                "_score": 1.0,
                "_source": {
                    "RestaurantName": "PizzaVito"
                }
            },
            {
                "_index": "pizza",
                "_type": "_doc",
                "_id": "1r6ob4cB6Rdc8HbDmMuJ",
                "_score": 1.0,
                "_source": {
                    "RestaurantName": "Pizzalicious"
                }
            }
        ]
    }
}

huangapple
  • 本文由 发表于 2023年4月11日 07:21:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75981411.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定