英文:
Replacing characters in String using Meta characters or character classes
问题
我正在写一个程序,以删除字符串中的所有非字母数字字符,该字符串只包含小写字母。
我正在使用replaceAll函数,并且已经查看了一些正则表达式。
我的参考来源是:https://www.vogella.com/tutorials/JavaRegularExpressions/article.html,其中显示:
- \s:空白字符,等同于[ \t\n\x0b\r\f]
- \W:非单词字符[^\w]
我在Java中尝试了以下内容,但结果未能删除空格或符号:
lowercased = lowercased.replaceAll(""\\W\\s"", """);
输出:
amanaplanac analp anam a
我想知道问题出在哪里?
英文:
I am writing to remove all non-alphanumeric characters in a String with only lowercase letters.
I am using the replaceAll function and have looked at a few regexes
My reference is from: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html which shows that
- \s : A whitespace character, short for [ \t\n\x0b\r\f]
- \W : A non-word character [^\w]
I tried the folllowing in Java but the results didn't remove the spaces or symbols:
lowercased = lowercased.replaceAll("\\W\\s", "");
output:
amanaplanac analp anam a
May I know what is wrong?
答案1
得分: 3
正则表达式 \W\s
表示 "一个非单词字符 后跟 一个空白字符".
如果你想替换任何一个是这些字符的字符,可以使用以下之一:
-
\W|\s
,其中|
表示 或 -
[\W\s]
,其中[ ]
是一个 字符类,在这种情况下合并了内置的特殊字符类\W
和\s
,因为它们就是这样的。
在这两者之间,我建议使用第二个。
当然,有 \s
是多余的,因为 \s
表示空白字符,而 \W
表示非单词字符,由于空白字符不是单词字符,仅使用 \W
就足够了。
lowercased = lowercased.replaceAll("\\W+", "");
英文:
Regex \W\s
means "a non-word character followed by a whitespace character".
If you want to replace any character that is one of those, use one of these:
-
\W|\s
where|
means or -
[\W\s]
where[ ]
is a character class that in this case merges the built-in special character classes\W
and\s
, because that's what those are.
Of the two, I recommend using the second.
Of course, having \s
there is redundant, because \s
means whitespace character, and \W
means non-word character, and since whitespaces are not word characters, using \W
alone is enough.
lowercased = lowercased.replaceAll("\\W+", "");
答案2
得分: 0
使用 |
(或运算符)像 \W|\s
这样,因为 \W
和 \s
都是独立的情况,你想要替换它们。而且,由于空白字符不是单词字符,你可以只使用 \W
。
lowercased = lowercased.replaceAll("\\W|\\s", "");
英文:
Use |
(or operator) like \W|\s
since both \W
and \s
are independent case for which you want to replace. And since whitespace are not word character you can use \W
only.
lowercased = lowercased.replaceAll("\\W|\\s", "");
答案3
得分: 0
正则表达式 \W
用于匹配非数字 (0-9
)、字母 (A-Z
和 a-z
) 以及下划线 (_
) 的字符。而 /s
用于匹配空格。
由于 /W
已经处理了匹配非字母数字字符(不包括下划线)。因此不需要使用 \s
。
所以如果你正在使用 \W
,那么你允许字母数字值中包含下划线 (_
)。
使用以下内容来排除下划线。
lowercased = lowercased.replaceAll("\\W|_", "");
英文:
Regex \W
is meant for matching character's that are not numbers(0-9
), alphabets(A-Z
and a-z
) and underscore (_
). And /s
is meant for matching space.
As /W
already take care for matching non alphanumeric characters (excluding underscore). No need to use \s
.
So if you are using \W
you are allowing underscore(_
) with alphanumeric values.
use the following to exclude underscore as well.
lowercased = lowercased.replaceAll("\\W|_", "");
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论