2020年3月16日 19:15:09go评论68阅读模式

英文:

How does Java's Matcher.group (int) method avoid match the contents of sub-braces inside parentheses

问题

翻译好的部分如下：

我有一个字符串，类似于：

String str = "美国临时申请No.62004615";

还有一个正则表达式，类似于：

String regex = "(((美国|PCT|加拿大){0,1})([\\u4E00-\\u9FA5]{1,8})((NO.|NOS.){1})([\\d]{5,}))";

其他代码为

 Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(str);
    while (matcher.find()) {
        System.out.println("1:" + matcher.group(1) + "\n"
                + "2:" + matcher.group(2) + "\n"
                + "3:" + matcher.group(3) + "\n"
                + "4:" + matcher.group(4) + "\n"
                + "5:" + matcher.group(5) + "\n"
                + "6:" + matcher.group(6) + "\n"
                + "7:" + matcher.group(7));
    }

我知道圆括号（）用于启用正则表达式短语的分组。第1组是大组。
第二组是((美国|PCT|加拿大){0,1})用于匹配“美国”或“PCT”或“加拿大”。
第三组是([\u4E00-\u9FA5]{1,8})用于匹配长度为一到八的汉字字符。
第四组是((NO.|NOS.){1})用于匹配NO.或NOS。
第五组是([\d]{5,})用于匹配数字。
但是控制台显示的是

1:美国临时申请No.62004615 2:美国 3:美国 4:临时申请 5:No. 6:No. 7:62004615

第2组与第3组相同。第5组与第6组相同。
似乎第3组再次匹配括号内的子括号。我想知道是否有一种方法只匹配最外层的括号。
理想的结果应该是

1:美国临时申请No.62004615 2:美国  3:临时申请 4:No. 5:62004615

英文:

I have a string like

String str = &quot;美国临时申请No.62004615&quot;;

And a regex like

String regex = &quot;(((美国|PCT|加拿大){0,1})([\\u4E00-\\u9FA5]{1,8})((NO.|NOS.){1})([\\d]{5,}))&quot;;

And other code is

 Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher(str);
    while (matcher.find()) {
        System.out.println(&quot;1:&quot;+matcher.group(1)+&quot;\n&quot;
                +&quot;2:&quot;+matcher.group(2)+&quot;\n&quot;
                +&quot;3:&quot;+matcher.group(3)+&quot;\n&quot;
                +&quot;4:&quot;+matcher.group(4)+&quot;\n&quot;
                +&quot;5:&quot;+matcher.group(5)+&quot;\n&quot;
                +&quot;6:&quot;+matcher.group(6)+&quot;\n&quot;
                +&quot;7:&quot;+matcher.group(7));
    }

I know Parenthesis () are used to enable grouping of regex phrases. And group 1 is the big group.
The second group is ((美国|PCT|加拿大){0,1}) to match the "美国" or "PCT" or "加拿大".
The third group is ([\u4E00-\u9FA5]{1,8}) to match the chinese character which length is one to eight.
The fouth group is ((NO.|NOS.){1}) to match the NO. or NOS.
The fifth group is ([\d]{5,}) to match the number 
But the console is

1:美国临时申请No.62004615 2:美国 3:美国 4:临时申请 5:No. 6:No. 7:62004615

The group (2) is the same as group (3).The group (5) is the same as group (6)
It seems that group (3) rematches the sub-parentheses inside the parentheses again. I wonder if there is a way to match only the outermost parentheses。
The ideal result should be

1:美国临时申请No.62004615 2:美国  3:临时申请 4:No. 5:62004615

答案1

得分: 2

看起来你想要一个非捕获组。来自模式文档：

> <code>(?:</code>X<code>)</code>        X，作为非捕获组

所以，将这个：

(美国|PCT|加拿大)

改成这个：

(?:美国|PCT|加拿大)

…然后在匹配器中它将不再表示为一个组。

一些附注：

{0,1}与写作?相同。
{1}没有任何作用，可以完全删除。
[\\d]与\\d相同。

英文:

It sounds like you want a non-capturing group. From the Pattern documentation:

> <code>(?:</code>X<code>)</code>        X, as a non-capturing group

So, change this:

(美国|PCT|加拿大)

to this:

(?:美国|PCT|加拿大)

… and then it will not be represented as a group at all in the Matcher.

Some side notes:

{0,1} is the same as writing ?.
{1} does nothing and can be removed entirely.
[\\d] is the same as just \\d.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Java的Matcher.group(int)方法如何避免匹配括号内子括号中的内容

问题

答案1

在Java中的双向链表中移除对象时出现问题。

覆盖 Docker 容器中的 Spring Boot 属性

JUnit5: Where is the root of @CsvFileSource defined and can that definition be changed to refer to a different directory?

FasterXML对象映射字符串到LocalDateTime问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论