Go ReplaceAllString

huangapple go评论81阅读模式
英文:

Go ReplaceAllString

问题

我阅读了golang.org网站上的示例代码。基本上,代码看起来像这样:

re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

输出结果如下:

-T-T-
--xx-
---
-W-xxW-

我理解第一个输出,但是我不理解剩下的三个输出。有人可以解释一下第2、3和4个结果吗?谢谢。

英文:

I read the example code from golang.org website. Essentially the code looks like this:

re := regexp.MustCompile("a(x*)b")
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

The output is like this:

-T-T-
--xx-
---
-W-xxW-

I understand the first output, but I don't understand the the rest three. Can someone explain to me the results 2,3 and 4. Thanks.

答案1

得分: 7

最引人注目的是fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))这一行。文档中说:

> 在repl中,$符号被解释为Expand中的符号

Expand中说:

> 在模板中,变量由形如$name${name}的子字符串表示,其中name是一个非空的字母、数字和下划线序列。
> 对于超出范围或不匹配的索引或在正则表达式中不存在的名称的引用,将被替换为一个空切片。
>
> $name形式中,name被认为是尽可能长的:$1x等同于${1x},而不是${1}x$10等同于${10},而不是${1}0

因此,在第三次替换中,$1W被视为${1W},由于该组未初始化,因此在替换时使用空字符串。

当我说“该组未初始化”时,我是指该组在正则表达式模式中未定义,因此在匹配操作期间未填充。替换意味着获取所有匹配项,然后用替换模式替换它们。反向引用$xx结构)在匹配阶段填充。$1W组在模式中缺失,因此在匹配阶段未填充,只有在替换阶段发生时才使用空字符串。

第二次和第四次替换很容易理解,并且在上面的答案中已经描述过了。只是$1反向引用了第一个捕获组(用一对未转义的括号括起来的子模式)中捕获的字符,与示例4相同。

你可以将{}视为消除替换模式的歧义的一种方式。

现在,如果你需要使结果保持一致,可以使用命名捕获(?P<1W>....)

re := regexp.MustCompile("a(?P<1W>x*)b")  // <= 看这里,模式已更新
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

结果:

-T-T-
--xx-
--xx-
-W-xxW-

第二行和第三行现在产生一致的输出,因为命名组1W也是第一个组,$1编号的反向引用指向使用命名捕获$1W捕获的相同文本。

英文:

The most intriguing is the fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W")) line. The docs say:

> Inside repl, $ signs are interpreted as in Expand

And Expand says:

>In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores.
> A reference to an out of range or unmatched index or a name that is not present in the regular expression is replaced with an empty slice.
>
> In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

So, in the 3rd replacement, $1W is treated as ${1W} and since this group is not initialized, an empty string is used for replacement.

When I say "the group is not initialized", I mean to say that the group is not defined in the regex pattern, thus, it was not populated during the match operation. Replacing means getting all matches and then they are replaced with the replacement pattern. Backreferences ($xx constructs) are populated during the matching phase. The $1W group is missing in the pattern, thus, it was not populated during matching, and only an empty string is used when replacing phase occurs.

The 2nd and 4th replacements are easy to understand and have been described in the above answers. Just $1 backreferences the characters captured with the first capturing group (the subpattern enclosed with a pair of unescaped parentheses), same is with Example 4.

You can think of {} as a means to disambiguate the replacement pattern.

Now, if you need to make the results consistent, use a named capture (?P<1W>....):

re := regexp.MustCompile("a(?P<1W>x*)b")  // <= See here, pattern updated
fmt.Println(re.ReplaceAllString("-ab-axxb-", "T"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))
fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

Results:

-T-T-
--xx-
--xx-
-W-xxW-

The 2nd and 3rd lines now produce consistent output since the named group 1W is also the first group, and $1 numbered backreference points to the same text captured with a named capture $1W.

答案2

得分: 2

$number或$name是正则表达式中子组的索引或子组名称。

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))

$1是正则表达式中的子组1 = x*

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))

$1W没有子组名称1W => 用空值替换所有内容

fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

$1和${1}是相同的。用W替换所有子组1

更多信息请参考:https://golang.org/pkg/regexp/

英文:

$number or $name is index of subgroup in regex or subgroup name

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1"))

$1 is subgroup 1 in regex = x*

fmt.Println(re.ReplaceAllString("-ab-axxb-", "$1W"))

$1W no subgroup name 1W => Replace all with null

fmt.Println(re.ReplaceAllString("-ab-axxb-", "W"))

$1 and ${1} is the same. replace all subgroup 1 with W

for more information : https://golang.org/pkg/regexp/

答案3

得分: 1

$1${1}的简写形式。

${1}是第一个(1)组的值,例如第一对括号的内容。该组是(x*),即任意数量的x

ReplaceAllString替换每个匹配项。有两个匹配项。第一个是ab,第二个是axxb

No 2. 用组的内容替换任何匹配项:第一个匹配项是“”,第二个匹配项是“xx”。

No 4. 在组的内容后面添加一个“W”。

No 3. 留作练习。提示:第十二个捕获组将是$12。

英文:

$1 is a shorthand for ${1}

${1} is the value of the first (1) group, e.g. the content of the first pair of (). This group is (x*) i.e. any number of x.

ReplaceAllString replaces every match. There are two matches. The first is ab, the second is axxb.

No 2. replaces any match with the content of the group: This is "" in the first match and "xx" in the second.

No 4. adds a "W" after the content of the group.

No 3. Is left as an exercise. Hint: The twelfth capturing group would be $12.

huangapple
  • 本文由 发表于 2016年1月8日 17:02:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/34673039.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定