正则表达式 – GetGroupNames 返回的分组多于预期。

huangapple go评论73阅读模式
英文:

Regex- GetGroupNames returning more groups than expected

问题

我无法理解我的正则表达式的情况。我有以下伪代码:

string patternRegex = "^(?<GROUP_1>[1-4][0-9]|7[3-9]|8[0-9])(?<GROUP_2>(\\d{5}))$";
Regex regex = new Regex(patternRegex, RegexOptions.Compiled);
var groupNames = regex.GetGroupNames();

我期望GetGroupNames()应该返回两个组(GROUP_1GROUP_2),但在这种情况下,它返回了四个组。

  1. 0
  2. 1
  3. GROUP_1
  4. GROUP_2

当我尝试匹配字符串:1395614 时,组的值为:

  1. 1395614
  2. 95614
  3. 13
  4. 95614

我已经在Google上搜索过,有人说0组代表完全匹配。

GROUP_1GROUP_2的值也是正确的。

但是组1(我不是在这里询问GROUP_1)代表什么?

英文:

I cant understand what is going on with my regex's.
I have a below pseudo-code:

string patternRegex=&quot;^(?&lt;GROUP_1&gt;[1-4][0-9]|7[3-9]|8[0-9])(?&lt;GROUP_2&gt;(\\d{5}))$&quot;
Regex regex = new Regex(patternRegex, RegexOptions.Compiled);
var groupNames = regex.GetGroupNames();

I expect that GetGroupNames() should return me two groups (GROUP_1 and GROUP_2), but in this case it returns me 4 groups.

  1. 0
  2. 1
  3. GROUP_1
  4. GROUP_2

When i try to match string: 1395614, the group values are:

  1. 1395614
  2. 95614
  3. 13
  4. 95614

I already googled it, and i read somewhere that 0 group stands for the full match.

GROUP_1 and GROUP_2 values are also correct.

But what group 1 (im not asking here about GROUP_1) stands for?

.NET Fiddle: https://dotnetfiddle.net/Bfgv9s

答案1

得分: 4

为了避免默认的未命名捕获组索引为1,您应该重写您的正则表达式为:

string patternRegex = "^(?<GROUP_1>(?:[1-4][0-9]|7[3-9]|8[0-9]))(?<GROUP_2>(?:\\d{5}))$";

对于第一个捕获组 GROUP_1,请使用 (?:[1-4][0-9]|7[3-9]|8[0-9]),而不是 [1-4][0-9]|7[3-9]|8[0-9]
(?:...) 是一个非捕获组,允许我们将模式的一部分分组,而不创建捕获组。这确保了这部分的匹配不会分配给捕获组。

对于第二个捕获组 GROUP_2,请使用 (?:\\d{5}),而不是 \\d{5}
同样,(?:\\d{5}) 是一个非捕获组。

英文:

To avoid default unnamed capturing group with index 1, you should rewrite your regexp to

string patternRegex = &quot;^(?&lt;GROUP_1&gt;(?:[1-4][0-9]|7[3-9]|8[0-9]))(?&lt;GROUP_2&gt;(?:\\d{5}))$&quot;;

For the first capturing group GROUP_1, use (?:[1-4][0-9]|7[3-9]|8[0-9]) instead of [1-4][0-9]|7[3-9]|8[0-9].
The (?:...) is a non-capturing group that allows us to group parts of the pattern without creating a capturing group. It ensures that the match for this part will not be assigned to a capturing group.

For the second capturing group GROUP_2, use (?:\\d{5}) instead of \\d{5}.
Again, (?:\\d{5}) is a non-capturing group.

答案2

得分: 3

你确实有完全匹配。
每对括号都有一个分组。

基本上,你还有这个匹配:(\d{5}),它被捕获为组 "1"。

你可以通过以下方式访问捕获的组:

var match = regex.Match("1395614");
var group1 = match.Groups["GROUP_1"].Value;

为了防止组被捕获,只需在(\d{5})组前加上?:,表示"非捕获组"。

它应该看起来像这样:(?:\d{5})。

英文:

Indeed you have the full match.
You have also a group for each parenthesis couple.

Basically you have also this match : (\d{5}) which is captured as group "1"

You can access the captured groups this way:

var match = regex.Match(&quot;1395614&quot;);
var group1 = match.Groups[&quot;GROUP_1&quot;].Value;

To prevent group from being catched, simply add ?: that stands for non capturing group to the (\d{5}) group.

It should look like: (?:\d{5}).

huangapple
  • 本文由 发表于 2023年7月31日 19:51:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76803339.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定