问题

以下是您要翻译的内容：

我正在Java中尝试使用正则表达式，特别是与分组相关的部分。我试图从包含XML的字符串中去除空标签。如果不使用分组，一切都正常，但是如果我尝试使用带有分组的正则表达式，就会出现我不理解的情况。我希望的行为类似于下面代码中的最后一个断言：

我可以使用这个正则表达式："\\s*<\\s*\\w+\\s*>\\s*</\\s*\\w+\\s*>"，但我不明白为什么我不能用 "(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>" 做同样的事情。
请向我解释在这里指定的正则表达式行为上的区别。

英文:

I am experimenting with regular expressions in Java, in particular with groups. I am trying to strip empty tags from a string with xml. Without using groups, everything seems to be fine, but if I try to define a regex using groups, magic begins that I don't understand. I expect behavior like last assertion in code below:

    @Test
    public void testRegexpGroups() {
        String xml =
            &quot;&lt;root&gt;\n&quot; +
                &quot;    &lt;yyy&gt;&lt;/yyy&gt;\n&quot; +
                &quot;    &lt;yyy&gt;456&lt;/yyy&gt;\n&quot; +
                &quot;    &lt;aaa&gt;  \n\n&quot; +
                &quot;    &lt;/aaa&gt;\n&quot; +
                &quot;&lt;/root&gt;&quot;;
        Pattern patternA = Pattern.compile(&quot;(\\s*)&lt;(\\s*\\w+\\s*)&gt;(\\1)&lt;/(\\2)&gt;&quot;);
        Pattern patternB = Pattern.compile(&quot;(\\s*)&lt;(\\s*\\w+\\s*)&gt;\\s*&lt;/(\\2)&gt;&quot;);
        Pattern patternC = Pattern.compile(&quot;\\s*&lt;\\s*\\w+\\s*&gt;\\s*&lt;/\\s*\\w+\\s*&gt;&quot;);


        assertEquals(
            &quot;&lt;root&gt;\n&quot; +
            &quot;    \n&quot; +
            &quot;    &lt;yyy&gt;456&lt;/yyy&gt;\n&quot; +
            &quot;    &lt;aaa&gt;  \n&quot; +
            &quot;\n&quot; +
            &quot;    &lt;/aaa&gt;\n&quot; +
            &quot;&lt;/root&gt;&quot;,
            patternA.matcher(xml).replaceAll(&quot;&quot;)
        );

        assertEquals(
            &quot;&lt;root&gt;\n&quot; +
                &quot;    &lt;yyy&gt;456&lt;/yyy&gt;\n&quot; +
                &quot;&lt;/root&gt;&quot;,
            patternB.matcher(xml).replaceAll(&quot;&quot;)
        );

        assertEquals(
            &quot;&lt;root&gt;\n&quot; +
                &quot;    &lt;yyy&gt;456&lt;/yyy&gt;\n&quot; +
                &quot;&lt;/root&gt;&quot;,
            patternC.matcher(xml).replaceAll(&quot;&quot;)
        );
    }

I can get it if I use this regex: "\\s*<\\s*\\w+\\s*>\\s*</\\s*\\w+\\s*>", but I don't understand why I can't do the same with "(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>"
Please explain to me the difference in the behavior of the regular expressions specified here.

答案1

得分: 0

在正则表达式中，\1 和 \2 被称为反向引用。它们寻找先前捕获组先前匹配的相同文本。它们使你能够编写正则表达式，例如检测重复的字母和单词。

例如，(\w+)\1 匹配重复出现两次的字符串"words"。

"banana".matches("(\\w+)\\1") // ==> false

"banabana".matches("(\\w+)\\1") // ==> true: bana 重复出现

在你的正则表达式 "(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>" 中，你要求标签内的空白与标签前的空白匹配。

英文:

In regular expressions, \1 and \2 are called back references. They look for the same text that was matched previously by a capturing group. They enable you to write regular expressions that for example detect duplicated letters and words.

For example (\w+)\1 matches strings "words" that are the same text repeated twice.

&quot;banana&quot;.matches(&quot;(\\w+)\\1&quot;) // ==&gt; false
&quot;banabana&quot;.matches(&quot;(\\w+)\\1&quot;) // ==&gt; true: bana is repeated

In your regexp "(\\s*)<(\\s*\\w+\\s*)>(\\1)</(\\2)>" you require that the white space within the tag matches the white space before the tag.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Java中，使用和不使用分组的正则表达式表现不同。

问题

答案1

将数据从ItemListener事件存储到ArrayList中。

如何通过非ID字段建立多对多关联。

如何在Java 8中将一个二维数组或任何数组转换为List\\>？

如何执行双精度加法，并有效地检测结果无法表示。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论