为什么在使用正则表达式时,Java的replaceAll()需要在前面添加"\\"?

huangapple go评论81阅读模式
英文:

Why Java replaceAll() using regular expression need to add "\\" at front?

问题

对于输入的字符串 id,我想要执行以下4个步骤,如下所示:

  1. 删除所有非小写字母、数字、"-"、"_"、"." 字符。
  2. 如果有连续多个 ".",将其替换为单个 "."(例如:he......llo -> he.llo)。
  3. 如果字符串以 "." 开头,删除它。
  4. 如果字符串以 "." 结尾,删除它。

以下是我的代码的4行:

    id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
    id = id.replaceAll(".{2,}",".");
    id = id.replaceAll("^.","");
    id = id.replaceAll(".$","");

我发现规则2的结果将会是 "."(例如:he...llo -> .),而规则3和规则4会删除不是 "." 的字符串。

所以我将代码修正如下:

    id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
    id = id.replaceAll("\\.{2,}",".");
    id = id.replaceAll("\\^.","");
    id = id.replaceAll("\\.$","");

这样就可以正常工作了。我只是不太理解,为什么正则表达式在使用之前需要添加两个 "\"?如果是这样,为什么规则1可以正常工作?谁可以给我一个具体正确的答案?最后,我想知道是否可以同时编码规则3和规则4?像使用 && 这样的方式?

英文:

For input String id, I want to do 4 steps like below :

  1. Remove all not lowercase alphabet, number, "-", "_", "."
  2. If "." is multiple and continuous, replace it to single "." (ex: he......llo -> he.llo)
  3. If String start with ".", remove it.
  4. If String ends with ".", remove it.

And Here is 4 Line of my code :

id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll(".{2,}",".");
id = id.replaceAll("^.","");
id = id.replaceAll(".$","");

I found the return of rule 2 will be "." (ex : he...llo -> .)
and rule 3,4 will remove string which is not "."

So I fix the code like :

id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll("\\.{2,}",".");
id = id.replaceAll("\\^.","");
id = id.replaceAll("\\.$","");

And it works fine.
I just don't understand. Is that regular expression need to add "\" twice before it uses?
If it is right, why rule 1 work just fine? Who can get me right answer specifically?
at last, I wonder can I code rule 3 and rule 4 at once? like using && to ?

答案1

得分: 1

  • . 在正则表达式中表示“匹配任何单个字符”。
  • \. 在正则表达式中表示“匹配一个单独的点/句号字符”。另一种写法是 [.],它产生相同的结果,但在语义上不同(我不确定这是否对生成的代码匹配表达式产生负面影响)。
  • [abc.] 在正则表达式中表示“匹配一个字符,它必须是 'a' 或 'b' 或 'c' 或 '.'”([^…] 反转了意义:匹配任何 不是 这些字符的字符)。注意:- 在字符类中有特殊意义,所以如果要匹配连字符字符,一定要确保将其放在开头或结尾。

至于为什么必须重复反斜杠:Java 本身使用反斜杠在字符串中转义字符。要在字符串中获得字面的反斜杠作为一部分,必须要转义反斜杠本身:"\\\\" 是包含单个反斜杠字符的字符串("\\" 在 Java 中是语法错误,因为反斜杠会转义后面的引号,即字符串没有被正确终止)。

为了将您的逻辑简化为两个 replaceAll 调用,我建议更改调用顺序,然后使用 | 运算符将表达式连接为替代项:

id = id.replaceAll(".+", ".") // 折叠所有点
        .replaceAll("[^a-z0-9_.-]|^\\.|\\.$", "");
英文:
  • . in a regular expression means "match any single character"
  • \. in a regular expression means "match a single dot/period/full-stop character". A different way to write this would be [.], which has the same end result, but is semantically different (I'm not sure if this has a negative impact on the generated code to match the expression)
  • [abc.] in a regular expression means "match a single character that must be 'a' or 'b' or 'c' or '.'" ([^…] inverts the meaning: match any character that is not). Attention: - has special meaning in a character class, so make sure you always put it first or last if you want to match the hyphen character specfically.

As for why the backslash has to be duplicated: Java itself uses the backslash to escape characters in a string. To get a literal backslash as part of the string, you have to escape the backslash itself: "\\" is a string containing a single backslash character ("\" is a syntax error in Java, because the backslash escapes the following quotation mark, i.e. the string is never terminated).

To reduce your logic down to two replaceAll calls, I would suggest to change the order of your calls and then join your expressions as alternatives with the | operator:

id = id.replaceAll(".+", ".") // fold all dots
        .replaceAll("[^a-z0-9_.-]|^\\.|\\.$", "");

huangapple
  • 本文由 发表于 2020年9月12日 16:43:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/63858509.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定