英文:
MariaDB REGEX with escaped characters
问题
我正在开发一个摄影数据库,其中每个图像都有一个相对自由的描述。我希望用户能够搜索这个数据库 - 最初我使用了LIKE %...%,但在许多情况下会返回太多匹配项 - 例如搜索cat可能会返回category和其他单词。因此,我决定使用REGEXP与\b来匹配单词边界,这在大多数情况下都有效,除非涉及特殊字符。
在我的测试数据库中,我有以下文本片段的记录:
taken in Victoria by
taken in Melbourne (Victoria) by
taken in Melbourne (Victoria). This is
this is a Victorian monument
如果我的搜索是
LIKE %victoria%
我得到了所有4条记录,正如我所期望的那样 - 但这可能不是用户想要的。
如果我的搜索是
REGEXP '\\bvictoria\\b'
我得到了预期的前3条记录。到目前为止一切正常 - 但我无法控制用户搜索什么 - 如果他们只想要第2和第3条记录怎么办?所以我搜索
REGEXP '\\(victoria\\)'
我确实得到了我想要的两条记录。但如果我尝试允许他们只搜索第3条记录(带有句点的情况),它也失败了。
有什么建议吗?
英文:
I am developing a photographic database where each image has a description which is fairly free-form. I want users to be able to search this database - initially I used LIKE %...% but that gives in many cases far too many matches - e.g search for cat may return category and other words. So I decided to use REGEXP with \b to match word boundaries which mostly works unless there are special characters.
In my test database I have records with the following text fragments:
taken in Victoria by
taken in Melbourne (Victoria) by
taken in Melbourne (Victoria). This is
this is a Victorian monument
If my search is
LIKE %victoria%
I get all 4 records as I expect - but that is probably not what a user wants.
If my search is
REGEXP '\\bvictoria\\b'
I get as expected the first 3 records only. So far so good - but I cannot control what my users look for - what if they want records 2 and 3 only? So I search for
REGEXP '\\(victoria\\)'
and I indeed get the two records I want. But if I then do
REGEXP '\\b\\(victoria\\)\\b'
I get nothing. And if try to allow for them looking for only the 3rd record (with a trailing . it also fails.
Any suggestions please?
答案1
得分: 1
你可以使用 |
语法。
这将尝试匹配其中任何一个模式。
\bvictoria\b|\(victoria\)
英文:
You can use the |
syntax.
This will attempt to the match either of the patterns.
\bvictoria\b|\(victoria\)
Wikipedia – Regular expression – POSIX extended.
答案2
得分: 1
短期解决方案:
- 在进行搜索的表元素上使用不区分大小写的排序规则
中期解决方案:
这将使用 WHERE MATCH(textfield) AGAINST('victoria')
作为搜索条件。
部分卸载解决方案:
SphinxSE 提供了在 MariaDB 语法中进行全文搜索的能力。它在 Sphinx 放弃了 SphinxSE 的开发后已经过时。
完全卸载解决方案:
您可以通过使用专用搜索引擎如 Sphinx、Solr 等来获得更好的全文搜索能力。
英文:
Short gap solution:
- Use case insensitive collations on the table elements being searched
Medium solution:
This will use WHERE MATCH(textfield) AGAINST('victoria')
as a search criteria.
Partial offload solution:
SphinxSE provides capabilities of a full text search into the MariaDB syntax. Its getting a little dated as Sphinx has progressed significantly since abandoning development of SphinxSE.
Full Offload solutions:
You can get better full text search capabilities by using dedicated search engines like Sphinx, Solr, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论