MariaDB中带有转义字符的正则表达式

huangapple go评论65阅读模式
英文:

MariaDB REGEX with escaped characters

问题

我正在开发一个摄影数据库,其中每个图像都有一个相对自由的描述。我希望用户能够搜索这个数据库 - 最初我使用了LIKE %...%,但在许多情况下会返回太多匹配项 - 例如搜索cat可能会返回category和其他单词。因此,我决定使用REGEXP与\b来匹配单词边界,这在大多数情况下都有效,除非涉及特殊字符。

在我的测试数据库中,我有以下文本片段的记录:
taken in Victoria by
taken in Melbourne (Victoria) by
taken in Melbourne (Victoria). This is
this is a Victorian monument

如果我的搜索是
LIKE %victoria%
我得到了所有4条记录,正如我所期望的那样 - 但这可能不是用户想要的。

如果我的搜索是
REGEXP '\\bvictoria\\b'
我得到了预期的前3条记录。到目前为止一切正常 - 但我无法控制用户搜索什么 - 如果他们只想要第2和第3条记录怎么办?所以我搜索
REGEXP '\\(victoria\\)'
我确实得到了我想要的两条记录。但如果我尝试允许他们只搜索第3条记录(带有句点的情况),它也失败了。
有什么建议吗?

英文:

I am developing a photographic database where each image has a description which is fairly free-form. I want users to be able to search this database - initially I used LIKE %...% but that gives in many cases far too many matches - e.g search for cat may return category and other words. So I decided to use REGEXP with \b to match word boundaries which mostly works unless there are special characters.

In my test database I have records with the following text fragments:
taken in Victoria by
taken in Melbourne (Victoria) by
taken in Melbourne (Victoria). This is
this is a Victorian monument

If my search is
LIKE %victoria%
I get all 4 records as I expect - but that is probably not what a user wants.

If my search is
REGEXP '\\bvictoria\\b'
I get as expected the first 3 records only. So far so good - but I cannot control what my users look for - what if they want records 2 and 3 only? So I search for
REGEXP '\\(victoria\\)'
and I indeed get the two records I want. But if I then do
REGEXP '\\b\\(victoria\\)\\b'
I get nothing. And if try to allow for them looking for only the 3rd record (with a trailing . it also fails.
Any suggestions please?

答案1

得分: 1

你可以使用 | 语法。

这将尝试匹配其中任何一个模式。

\bvictoria\b|\(victoria\)

维基百科 – 正则表达式 – POSIX 扩展

英文:

You can use the | syntax.

This will attempt to the match either of the patterns.

\bvictoria\b|\(victoria\)

Wikipedia – Regular expression – POSIX extended.

答案2

得分: 1

短期解决方案:

  • 在进行搜索的表元素上使用不区分大小写的排序规则

中期解决方案:

这将使用 WHERE MATCH(textfield) AGAINST('victoria') 作为搜索条件。

部分卸载解决方案:

SphinxSE 提供了在 MariaDB 语法中进行全文搜索的能力。它在 Sphinx 放弃了 SphinxSE 的开发后已经过时。

完全卸载解决方案:

您可以通过使用专用搜索引擎如 Sphinx、Solr 等来获得更好的全文搜索能力。

英文:

Short gap solution:

  • Use case insensitive collations on the table elements being searched

Medium solution:

This will use WHERE MATCH(textfield) AGAINST('victoria') as a search criteria.

Partial offload solution:

SphinxSE provides capabilities of a full text search into the MariaDB syntax. Its getting a little dated as Sphinx has progressed significantly since abandoning development of SphinxSE.

Full Offload solutions:

You can get better full text search capabilities by using dedicated search engines like Sphinx, Solr, etc.

huangapple
  • 本文由 发表于 2023年6月29日 06:39:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76577110.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定