2023年6月1日 00:44:15go评论92阅读模式

英文:

Why is this python regular expression not ignoring accents?

问题

我正在使用以下正则表达式作为连接到MongoDB数据库的应用程序的筛选器：

{"$regex": re.compile(r'\b' + re.escape(value) + r'\b', re.IGNORECASE | re.UNICODE)}

这个正则表达式满足我的搜索条件，但我遇到了一个问题，即它不会忽略重音。例如：

数据库条目是：“Escobar，el patrón del mal Colombia historia”。

然后我搜索“El patron”。

我没有得到任何结果，因为字母O中的“重音”不允许我获取记录。我该如何修复它？我以为使用了re.UNICODE部分会忽略这个问题。

英文:

I am using the following regular expression for a filter of an application that connects to a MongoDB database:

{&quot;$regex&quot;: re.compile(r&#39;\b&#39; + re.escape(value) + r&#39;\b&#39;, re.IGNORECASE | re.UNICODE)}

The regular expression meets my search criteria however I have a problem and that is that it does not ignore accents. For example:

The database entry is: "Escobar, el patrón del mal Colombia historia".

And I search for "El patron".

I do not get any result because the "accent" in the letter O does not let me fetch the record. How can I fix it? I thought that with the re.UNICODE part I would ignore this.

答案1

得分: 2

因为 o 和 ó 是不同的字符。 re.UNICODE 并不是你所想的那样工作，你可以在这里阅读关于它的信息：https://docs.python.org/3/library/re.html#re.ASCII

在使用正则表达式搜索之前，你可以通过首先预处理字符串将所有这些字符转换为它们关联的 ASCII 字符来解决这个问题。请参考：https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string

英文:

Because o and ó are different characters. re.UNICODE does not do what you think it does, you can read about it here: https://docs.python.org/3/library/re.html#re.ASCII

You can solve this issue by first preprocessing strings to convert all such characters to their associated ascii counterparts before searching through with a regex. See: https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-normalize-in-a-python-unicode-string

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么这个 Python 正则表达式没有忽略重音符号？

问题

答案1

尝试使用exec.Command在Golang中执行Python代码。

拖放功能适用于移动设备

如何在Python中在特定日期为一个国家分配一个季节？

如何模拟pymysqlpool.ConnectionPool构造函数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。