2020年9月12日 16:43:18go评论81阅读模式

英文:

Why Java replaceAll() using regular expression need to add "\\" at front?

问题

对于输入的字符串 id，我想要执行以下4个步骤，如下所示：

删除所有非小写字母、数字、"-"、"_"、"." 字符。
如果有连续多个 "."，将其替换为单个 "."（例如：he......llo -> he.llo）。
如果字符串以 "." 开头，删除它。
如果字符串以 "." 结尾，删除它。

以下是我的代码的4行：

    id = id.replaceAll(&quot;[^&quot; + &quot;a-z&quot; + &quot;0-9&quot; + &quot;-&quot; + &quot;_&quot; + &quot;.&quot; + &quot;]&quot;, &quot;&quot;);
    id = id.replaceAll(&quot;.{2,}&quot;,&quot;.&quot;);
    id = id.replaceAll(&quot;^.&quot;,&quot;&quot;);
    id = id.replaceAll(&quot;.$&quot;,&quot;&quot;);

我发现规则2的结果将会是 "."（例如：he...llo -> .），而规则3和规则4会删除不是 "." 的字符串。

所以我将代码修正如下：

    id = id.replaceAll(&quot;[^&quot; + &quot;a-z&quot; + &quot;0-9&quot; + &quot;-&quot; + &quot;_&quot; + &quot;.&quot; + &quot;]&quot;, &quot;&quot;);
    id = id.replaceAll(&quot;\\.{2,}&quot;,&quot;.&quot;);
    id = id.replaceAll(&quot;\\^.&quot;,&quot;&quot;);
    id = id.replaceAll(&quot;\\.$&quot;,&quot;&quot;);

这样就可以正常工作了。我只是不太理解，为什么正则表达式在使用之前需要添加两个 "\"？如果是这样，为什么规则1可以正常工作？谁可以给我一个具体正确的答案？最后，我想知道是否可以同时编码规则3和规则4？像使用 && 这样的方式？

英文:

For input String id, I want to do 4 steps like below :

Remove all not lowercase alphabet, number, "-", "_", "."
If "." is multiple and continuous, replace it to single "." (ex: he......llo -> he.llo)
If String start with ".", remove it.
If String ends with ".", remove it.

And Here is 4 Line of my code :

id = id.replaceAll(&quot;[^&quot; + &quot;a-z&quot; + &quot;0-9&quot; + &quot;-&quot; + &quot;_&quot; + &quot;.&quot; + &quot;]&quot;, &quot;&quot;);
id = id.replaceAll(&quot;.{2,}&quot;,&quot;.&quot;);
id = id.replaceAll(&quot;^.&quot;,&quot;&quot;);
id = id.replaceAll(&quot;.$&quot;,&quot;&quot;);

I found the return of rule 2 will be "." (ex : he...llo -> .)
and rule 3,4 will remove string which is not "."

So I fix the code like :

id = id.replaceAll(&quot;[^&quot; + &quot;a-z&quot; + &quot;0-9&quot; + &quot;-&quot; + &quot;_&quot; + &quot;.&quot; + &quot;]&quot;, &quot;&quot;);
id = id.replaceAll(&quot;\\.{2,}&quot;,&quot;.&quot;);
id = id.replaceAll(&quot;\\^.&quot;,&quot;&quot;);
id = id.replaceAll(&quot;\\.$&quot;,&quot;&quot;);

And it works fine.
I just don't understand. Is that regular expression need to add "\" twice before it uses?
If it is right, why rule 1 work just fine? Who can get me right answer specifically?
at last, I wonder can I code rule 3 and rule 4 at once? like using && to ?

答案1

得分: 1

. 在正则表达式中表示“匹配任何单个字符”。
\. 在正则表达式中表示“匹配一个单独的点/句号字符”。另一种写法是 [.]，它产生相同的结果，但在语义上不同（我不确定这是否对生成的代码匹配表达式产生负面影响）。
[abc.] 在正则表达式中表示“匹配一个字符，它必须是 'a' 或 'b' 或 'c' 或 '.'”（[^…] 反转了意义：匹配任何不是这些字符的字符）。注意：- 在字符类中有特殊意义，所以如果要匹配连字符字符，一定要确保将其放在开头或结尾。

至于为什么必须重复反斜杠：Java 本身使用反斜杠在字符串中转义字符。要在字符串中获得字面的反斜杠作为一部分，必须要转义反斜杠本身："\\\\" 是包含单个反斜杠字符的字符串（"\\" 在 Java 中是语法错误，因为反斜杠会转义后面的引号，即字符串没有被正确终止）。

为了将您的逻辑简化为两个 replaceAll 调用，我建议更改调用顺序，然后使用 | 运算符将表达式连接为替代项：

id = id.replaceAll(".+", ".") // 折叠所有点
        .replaceAll("[^a-z0-9_.-]|^\\.|\\.$", "");

英文:

. in a regular expression means "match any single character"
\. in a regular expression means "match a single dot/period/full-stop character". A different way to write this would be [.], which has the same end result, but is semantically different (I'm not sure if this has a negative impact on the generated code to match the expression)
[abc.] in a regular expression means "match a single character that must be 'a' or 'b' or 'c' or '.'" ([^…] inverts the meaning: match any character that is not). Attention: - has special meaning in a character class, so make sure you always put it first or last if you want to match the hyphen character specfically.

As for why the backslash has to be duplicated: Java itself uses the backslash to escape characters in a string. To get a literal backslash as part of the string, you have to escape the backslash itself: "\\" is a string containing a single backslash character ("\" is a syntax error in Java, because the backslash escapes the following quotation mark, i.e. the string is never terminated).

To reduce your logic down to two replaceAll calls, I would suggest to change the order of your calls and then join your expressions as alternatives with the | operator:

id = id.replaceAll(&quot;.+&quot;, &quot;.&quot;) // fold all dots
        .replaceAll(&quot;[^a-z0-9_.-]|^\\.|\\.$&quot;, &quot;&quot;);

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在使用正则表达式时，Java的replaceAll()需要在前面添加"\\"？

问题

答案1

Z3 Solver Java API：为RealExpr实现模运算

Java: 无法定位具有区域设置的资源文件

类加载从包中生成NoClassDefFoundError。

如何将注解应用于类的每个方法？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论