英文:
Why does '³' (superscript 3) match the python re for alpha characters?
问题
匹配一串Unicode字母的正则表达式是:"[^\W\d_]+"
但当我执行以下代码时:
import re
re.match("[^\W\d_]+",'³')
我得到了:
<re.Match object; span=(0, 1), match='³'>
为什么呢?
英文:
This is the match string I use for matching a sequence of Unicode letters - "[^\W\d_]+"
but when I do :
import re
re.match("[^\W\d_]+",'³')
I get
<re.Match object; span=(0, 1), match='³'>
Why?
答案1
得分: 3
我认为[^\W\d_]+
匹配除数字和下划线以外的字母数字字符:
\W
匹配任何不是单词字符的字符。这与\w
相反。如果使用ASCII标志,这变成了等同于[^a-zA-Z0-9_]。如果使用LOCALE标志,它匹配当前区域设置中既不是字母数字字符也不是下划线的字符。 (来自Python的re文档)。\d
匹配十进制数字。_
匹配下划线。[^\W\d_]
匹配除\W
、\d
和_
之外的任何内容。这意味着它匹配除任何不是单词字符、十进制数字和下划线之外的任何内容。这意味着它匹配单词字符,除了十进制数字和下划线。'³'
是一个单词字符,不是十进制数字,也不是下划线,所以它匹配。
英文:
I think [^\W\d_]+
matches alphanumeric characters other than digits and underscore:
\W
Matches any character which is not a word character. This is the opposite of \w. If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_]. If the LOCALE flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore. (from Python's re docs).\d
matches decimal digits_
matches underscore[^blablabla]
matches anything but blablbla[^\W\d_]
matches anything but\W
,\d
and_
. Which means it matches anything but any character which is not a word character, decimal digits, and underscore. Which means it matches word character except decimal digits and underscore. '³' is a word character, and not decimal digit, and not underscore, so it matches.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论