使用元字符或字符类替换字符串中的字符

huangapple go评论80阅读模式
英文:

Replacing characters in String using Meta characters or character classes

问题

我正在写一个程序,以删除字符串中的所有非字母数字字符,该字符串只包含小写字母。

我正在使用replaceAll函数,并且已经查看了一些正则表达式。

我的参考来源是:https://www.vogella.com/tutorials/JavaRegularExpressions/article.html,其中显示:

  • \s:空白字符,等同于[ \t\n\x0b\r\f]
  • \W:非单词字符[^\w]

我在Java中尝试了以下内容,但结果未能删除空格或符号:

lowercased = lowercased.replaceAll(""\\W\\s"", """);

输出:

amanaplanac analp anam a

我想知道问题出在哪里?

英文:

I am writing to remove all non-alphanumeric characters in a String with only lowercase letters.

I am using the replaceAll function and have looked at a few regexes

My reference is from: https://www.vogella.com/tutorials/JavaRegularExpressions/article.html which shows that

  • \s : A whitespace character, short for [ \t\n\x0b\r\f]
  • \W : A non-word character [^\w]

I tried the folllowing in Java but the results didn't remove the spaces or symbols:

lowercased = lowercased.replaceAll("\\W\\s", "");

output:

amanaplanac analp anam a

May I know what is wrong?

答案1

得分: 3

正则表达式 \W\s 表示 "一个非单词字符 后跟 一个空白字符".

如果你想替换任何一个是这些字符的字符,可以使用以下之一:

  • \W|\s,其中 | 表示

  • [\W\s],其中 [ ] 是一个 字符类,在这种情况下合并了内置的特殊字符类 \W\s,因为它们就是这样的。

在这两者之间,我建议使用第二个。


当然,有 \s 是多余的,因为 \s 表示空白字符,而 \W 表示非单词字符,由于空白字符不是单词字符,仅使用 \W 就足够了。

lowercased = lowercased.replaceAll("\\W+", "");
英文:

Regex \W\s means "a non-word character followed by a whitespace character".

If you want to replace any character that is one of those, use one of these:

  • \W|\s where | means or

  • [\W\s] where [ ] is a character class that in this case merges the built-in special character classes \W and \s, because that's what those are.

Of the two, I recommend using the second.


Of course, having \s there is redundant, because \s means whitespace character, and \W means non-word character, and since whitespaces are not word characters, using \W alone is enough.

lowercased = lowercased.replaceAll("\\W+", "");

答案2

得分: 0

使用 |(或运算符)像 \W|\s 这样,因为 \W\s 都是独立的情况,你想要替换它们。而且,由于空白字符不是单词字符,你可以只使用 \W

lowercased = lowercased.replaceAll("\\W|\\s", "");
英文:

Use | (or operator) like \W|\s since both \W and \s are independent case for which you want to replace. And since whitespace are not word character you can use \W only.

lowercased = lowercased.replaceAll("\\W|\\s", "");

答案3

得分: 0

正则表达式 \W 用于匹配非数字 (0-9)、字母 (A-Za-z) 以及下划线 (_) 的字符。而 /s 用于匹配空格。

由于 /W 已经处理了匹配非字母数字字符(不包括下划线)。因此不需要使用 \s

所以如果你正在使用 \W,那么你允许字母数字值中包含下划线 (_)。

使用以下内容来排除下划线。

lowercased = lowercased.replaceAll("\\W|_", "");
英文:

Regex \W is meant for matching character's that are not numbers(0-9), alphabets(A-Z and a-z) and underscore (_). And /s is meant for matching space.

As /W already take care for matching non alphanumeric characters (excluding underscore). No need to use \s.

So if you are using \W you are allowing underscore(_) with alphanumeric values.

use the following to exclude underscore as well.

lowercased = lowercased.replaceAll("\\W|_", "");

huangapple
  • 本文由 发表于 2020年9月6日 02:00:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/63756988.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定