关键词分析器用于产品代码

huangapple go评论52阅读模式
英文:

Keyword analyzer for product-codes

问题

我正在尝试为一些产品实现搜索索引,其中它们的产品代码应该可以使用通配符进行搜索。
这种产品代码的示例可能是这样的:TTY1012-0088-VTX1

我已经设置了产品代码字段以进行搜索,并使用了keyword分析器。
(使用test analyzer API ,我可以看到产品代码被标记为一个令牌。)

但是当我尝试使用Azure Cognitive Search Explorer搜索产品时,使用以下查询:queryType=full&search=/TTY-10.*/,什么都没有找到。
请注意,我正在使用Lucene正则表达式查询进行搜索,搜索查询包含连字符(-)!

我做错了什么,因为我无法让这个搜索返回任何结果?


注意

我可以从这个答案中了解到,当使用keyword分析器时,只有精确匹配搜索才是可能的。
我尚未能够找到任何官方文档中提到这一点的信息,那么这是正确的吗?


编辑 1

在进一步研究问题后,似乎产品代码中的连字符是问题的原因。
如果通配符搜索包含连字符,将找不到结果。
问题已经更新以包含这个细节。

我考虑通过从产品代码和传入的搜索请求中删除连字符来解决这个问题。

英文:

I am trying to implement a search index for some products, where their product-code should be searchable using wildcards.
An example of such a product-code could be something like this: TTY1012-0088-VTX1.

I have the set up the product-code field to be searchable and use the keyword analyzer.
(Using the test analyzer API I can see that the product-code is tokenized as one token.)

But when I try to search for the product, using the Azure Cognitive Search Explorer, with this query: queryType=full&search=/TTY-10.*/, nothing is found.
Notice that I am using a Lucene regex query for the search and the search query contains a hyphen (-)!

What am I doing wrong since I can't get this search to return anything?


NOTE

I can read, from this answer, that when using the keyword analyzer, only exact match searches are possible.
I have not been able find anything saying this is true in the official documentation, so is it correct?


EDIT 1

After working a bit more with the problem, it seems like it is hyphens in the product-codes that is the problem.
If the wildcard search includes a hyphen, no result are found.
The question have been updated with this detail.

I have thought about solving the problem by removing hyphens from the product-code and incoming search requests.

答案1

得分: 1

根据官方文档,对于正则表达式、通配符和模糊搜索,查询时不使用分析器。对于这些由解析器通过运算符和分隔符的存在来检测的查询形式,查询字符串将传递给引擎而不进行词法分析。对于这些查询形式,字段上指定的分析器将被忽略。在这种情况下,这适用于您正在使用的 "keyword" 分析器。

您可以在一个字段中使用 "keyword",在另一个字段中使用标准 Lucene 分析器,以便您的通配符查询正常工作。

如果您使用标准 Lucene 分析器搜索 TTY*,由于它会根据需要进行标记化,您的查询应该可以正常工作。以下是具有两种分析器的部分索引定义:

关键词分析器用于产品代码

这是您试图获取产品代码的通配符查询:

关键词分析器用于产品代码

这是一个完整的产品代码查询,因为您使用了关键字来包括短划线:

关键词分析器用于产品代码

关键词分析器用于产品代码

英文:

as per official documentation, For regular expression, wildcard, and fuzzy search, analyzers aren't used at query time. For these query forms, which the parser detects by the presence of operators and delimiters, the query string is passed to the engine without lexical analysis. For these query forms, the analyzer specified on the field is ignored. In this case, this applies to the analyzer "keyword" you are using.

You can use "keyword" in one of your fields and in another one use one with Standard Lucene so your wildcard query works.

If you search for TTY* if you are using the Standard Lucene analyzer, since it will tokenize as needed, your query should work. Here is part of the index definition with both analyzers:

关键词分析器用于产品代码

Here is the wildcard query you're trying to get that product code for:

关键词分析器用于产品代码

Here is one you'll get with the whole product code since you have the keyword to include the dashes as well:

关键词分析器用于产品代码

[Edited adding the below to adjust to the question latest edit as well]:
If you use the same advisory without using the wildcard, but only the partial term, this same configuration will work. Example:

关键词分析器用于产品代码

huangapple
  • 本文由 发表于 2023年5月24日 20:55:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76323787.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定