英文:
Replacing characters in String using Meta characters or character classes
问题
我正在写一个程序,以删除字符串中的所有非字母数字字符,该字符串只包含小写字母。
我正在使用replaceAll函数,并且已经查看了一些正则表达式。
我的参考来源是:https://www.vogella.com/tutorials/JavaRegularExpressions/article.html,其中显示:
- \s:空白字符,等同于[ \t\n\x0b\r\f]
 - \W:非单词字符[^\w]
 
我在Java中尝试了以下内容,但结果未能删除空格或符号:
lowercased = lowercased.replaceAll(""\\W\\s"", """);
输出:
amanaplanac analp anam a
我想知道问题出在哪里?
英文:
I am writing to remove all non-alphanumeric characters in a String with only lowercase letters.
I am using the replaceAll function and have looked at a few regexes
My reference is from: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html which shows that
- \s : A whitespace character, short for [ \t\n\x0b\r\f]
 - \W : A non-word character [^\w]
 
I tried the folllowing in Java but the results didn't remove the spaces or symbols:
lowercased = lowercased.replaceAll("\\W\\s", "");
output:
amanaplanac analp anam a
May I know what is wrong?
答案1
得分: 3
正则表达式 \W\s 表示 "一个非单词字符 后跟 一个空白字符".
如果你想替换任何一个是这些字符的字符,可以使用以下之一:
- 
\W|\s,其中|表示 或 - 
[\W\s],其中[ ]是一个 字符类,在这种情况下合并了内置的特殊字符类\W和\s,因为它们就是这样的。 
在这两者之间,我建议使用第二个。
当然,有 \s 是多余的,因为 \s 表示空白字符,而 \W 表示非单词字符,由于空白字符不是单词字符,仅使用 \W 就足够了。
lowercased = lowercased.replaceAll("\\W+", "");
英文:
Regex \W\s means "a non-word character followed by a whitespace character".
If you want to replace any character that is one of those, use one of these:
- 
\W|\swhere|means or - 
[\W\s]where[ ]is a character class that in this case merges the built-in special character classes\Wand\s, because that's what those are. 
Of the two, I recommend using the second.
Of course, having \s there is redundant, because \s means whitespace character, and \W means non-word character, and since whitespaces are not word characters, using \W alone is enough.
lowercased = lowercased.replaceAll("\\W+", "");
答案2
得分: 0
使用 |(或运算符)像 \W|\s 这样,因为 \W 和 \s 都是独立的情况,你想要替换它们。而且,由于空白字符不是单词字符,你可以只使用 \W。
lowercased = lowercased.replaceAll("\\W|\\s", "");
英文:
Use | (or operator) like \W|\s since both \W and \s are independent case for which you want to replace. And since whitespace are not word character you can use \W only.
lowercased = lowercased.replaceAll("\\W|\\s", "");
答案3
得分: 0
正则表达式 \W 用于匹配非数字 (0-9)、字母 (A-Z 和 a-z) 以及下划线 (_) 的字符。而 /s 用于匹配空格。
由于 /W 已经处理了匹配非字母数字字符(不包括下划线)。因此不需要使用 \s。
所以如果你正在使用 \W,那么你允许字母数字值中包含下划线 (_)。
使用以下内容来排除下划线。
lowercased = lowercased.replaceAll("\\W|_", "");
英文:
Regex \W is meant for matching character's that are not numbers(0-9), alphabets(A-Z and a-z) and underscore (_). And /s is meant for matching space.
As /W already take care for matching non alphanumeric characters (excluding underscore). No need to use \s.
So if you are using \W you are allowing underscore(_) with alphanumeric values.
use the following to exclude underscore as well.
lowercased = lowercased.replaceAll("\\W|_", "");
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论