2023年7月20日 11:29:00go评论152阅读模式

英文:

Why does '³' (superscript 3) match the python re for alpha characters?

问题

匹配一串Unicode字母的正则表达式是："[^\W\d_]+"

但当我执行以下代码时：

import re
re.match("[^\W\d_]+",'³')

我得到了：

<re.Match object; span=(0, 1), match='³'>

为什么呢？

英文:

This is the match string I use for matching a sequence of Unicode letters - "[^\W\d_]+"

but when I do :

import re
re.match(&quot;[^\W\d_]+&quot;,&#39;&#179;&#39;)

I get

<re.Match object; span=(0, 1), match='³'>

Why?

答案1

得分: 3

我认为[^\W\d_]+匹配除数字和下划线以外的字母数字字符：

\W 匹配任何不是单词字符的字符。这与\w相反。如果使用ASCII标志，这变成了等同于[^a-zA-Z0-9_]。如果使用LOCALE标志，它匹配当前区域设置中既不是字母数字字符也不是下划线的字符。（来自Python的re文档）。
\d 匹配十进制数字。
_ 匹配下划线。
[^\W\d_] 匹配除\W、\d和_之外的任何内容。这意味着它匹配除任何不是单词字符、十进制数字和下划线之外的任何内容。这意味着它匹配单词字符，除了十进制数字和下划线。 '³'是一个单词字符，不是十进制数字，也不是下划线，所以它匹配。

英文:

I think [^\W\d_]+ matches alphanumeric characters other than digits and underscore:

\W Matches any character which is not a word character. This is the opposite of \w. If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_]. If the LOCALE flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore. (from Python's re docs).
\d matches decimal digits
_ matches underscore
[^blablabla] matches anything but blablbla
[^\W\d_] matches anything but \W, \d and _. Which means it matches anything but any character which is not a word character, decimal digits, and underscore. Which means it matches word character except decimal digits and underscore. '³' is a word character, and not decimal digit, and not underscore, so it matches.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么’³’（上标3）与Python正则表达式中的字母字符匹配？

问题

答案1

Python: ProcessPoolExecutor vs ThreadPoolExecutor

使用loop.run_in_executor从异步函数中调用同步函数。

pandas column-slices with mypy

在线性回归中，对价格进行最小-最大缩放后得到负预测。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。