英文:
Why Java replaceAll() using regular expression need to add "\\" at front?
问题
对于输入的字符串 id,我想要执行以下4个步骤,如下所示:
- 删除所有非小写字母、数字、"-"、"_"、"." 字符。
- 如果有连续多个 ".",将其替换为单个 "."(例如:he......llo -> he.llo)。
- 如果字符串以 "." 开头,删除它。
- 如果字符串以 "." 结尾,删除它。
以下是我的代码的4行:
id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll(".{2,}",".");
id = id.replaceAll("^.","");
id = id.replaceAll(".$","");
我发现规则2的结果将会是 "."(例如:he...llo -> .),而规则3和规则4会删除不是 "." 的字符串。
所以我将代码修正如下:
id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll("\\.{2,}",".");
id = id.replaceAll("\\^.","");
id = id.replaceAll("\\.$","");
这样就可以正常工作了。我只是不太理解,为什么正则表达式在使用之前需要添加两个 "\"?如果是这样,为什么规则1可以正常工作?谁可以给我一个具体正确的答案?最后,我想知道是否可以同时编码规则3和规则4?像使用 && 这样的方式?
英文:
For input String id, I want to do 4 steps like below :
- Remove all not lowercase alphabet, number, "-", "_", "."
- If "." is multiple and continuous, replace it to single "." (ex: he......llo -> he.llo)
- If String start with ".", remove it.
- If String ends with ".", remove it.
And Here is 4 Line of my code :
id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll(".{2,}",".");
id = id.replaceAll("^.","");
id = id.replaceAll(".$","");
I found the return of rule 2 will be "." (ex : he...llo -> .)
and rule 3,4 will remove string which is not "."
So I fix the code like :
id = id.replaceAll("[^" + "a-z" + "0-9" + "-" + "_" + "." + "]", "");
id = id.replaceAll("\\.{2,}",".");
id = id.replaceAll("\\^.","");
id = id.replaceAll("\\.$","");
And it works fine.
I just don't understand. Is that regular expression need to add "\" twice before it uses?
If it is right, why rule 1 work just fine? Who can get me right answer specifically?
at last, I wonder can I code rule 3 and rule 4 at once? like using && to ?
答案1
得分: 1
.
在正则表达式中表示“匹配任何单个字符”。\.
在正则表达式中表示“匹配一个单独的点/句号字符”。另一种写法是[.]
,它产生相同的结果,但在语义上不同(我不确定这是否对生成的代码匹配表达式产生负面影响)。[abc.]
在正则表达式中表示“匹配一个字符,它必须是 'a' 或 'b' 或 'c' 或 '.'”([^…]
反转了意义:匹配任何 不是 这些字符的字符)。注意:-
在字符类中有特殊意义,所以如果要匹配连字符字符,一定要确保将其放在开头或结尾。
至于为什么必须重复反斜杠:Java 本身使用反斜杠在字符串中转义字符。要在字符串中获得字面的反斜杠作为一部分,必须要转义反斜杠本身:"\\\\"
是包含单个反斜杠字符的字符串("\\"
在 Java 中是语法错误,因为反斜杠会转义后面的引号,即字符串没有被正确终止)。
为了将您的逻辑简化为两个 replaceAll
调用,我建议更改调用顺序,然后使用 |
运算符将表达式连接为替代项:
id = id.replaceAll(".+", ".") // 折叠所有点
.replaceAll("[^a-z0-9_.-]|^\\.|\\.$", "");
英文:
.
in a regular expression means "match any single character"\.
in a regular expression means "match a single dot/period/full-stop character". A different way to write this would be[.]
, which has the same end result, but is semantically different (I'm not sure if this has a negative impact on the generated code to match the expression)[abc.]
in a regular expression means "match a single character that must be 'a' or 'b' or 'c' or '.'" ([^…]
inverts the meaning: match any character that is not). Attention:-
has special meaning in a character class, so make sure you always put it first or last if you want to match the hyphen character specfically.
As for why the backslash has to be duplicated: Java itself uses the backslash to escape characters in a string. To get a literal backslash as part of the string, you have to escape the backslash itself: "\\"
is a string containing a single backslash character ("\"
is a syntax error in Java, because the backslash escapes the following quotation mark, i.e. the string is never terminated).
To reduce your logic down to two replaceAll
calls, I would suggest to change the order of your calls and then join your expressions as alternatives with the |
operator:
id = id.replaceAll(".+", ".") // fold all dots
.replaceAll("[^a-z0-9_.-]|^\\.|\\.$", "");
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论