Golang:为什么regexp.FindAllStringSubmatch()返回的是[][]string而不是[]string?

huangapple go评论85阅读模式
英文:

Golang: Why does regexp.FindAllStringSubmatch() returns [][]string and not []string?

问题

我对Go语言还比较新,这是我第一次处理正则表达式。

我有点惊讶于someregex.FindAllStringSubmatch("somestring", -1)返回的是一个切片的切片[][]string,而不是一个简单的字符串切片[]string

例如:

someRegex, _ := regexp.Compile("^.*(mes).*$")
matches := someRegex.FindAllStringSubmatch("somestring", -1)
fmt.Println(matches) // 输出 [[somestring mes]]

这种行为的原因是什么,我搞不明白。

英文:

I am kind of new to go and that's the first time I have to deal with regexp.

I am a bit surprised that the someregex.FindAllStringSubmatch("somestring", -1) returns a slice of slice [][]string instead of a simple slice of string : []string.

example :

<!-- language: golang -->

someRegex, _ := regexp.Compile(&quot;^.*(mes).*$&quot;)
matches := someRegex.FindAllStringSubmatch(&quot;somestring&quot;, -1)
fmt.Println(matches) // logs [[somestring mes]]

What is the reason of this behavior, I can't figure it out ?

答案1

得分: 8

func (*Regexp) FindAllStringSubmatch提取匹配项和捕获子匹配项。

子匹配项是由一对未转义括号(称为捕获组)括起来的正则表达式部分匹配的文本的一部分。

在你的情况下,^.*(mes).*$匹配:

  • ^ - 字符串的开头
  • .* - 任意0个或多个字符,尽可能多地匹配
  • (mes) - 捕获组1:一个mes子字符串
  • .*$ - 字符串的剩余部分。

因此,匹配值是整个字符串。它将是输出中的第一个值。然后,由于有一个捕获组,结果中必须有一个位置给它,因此mes被放置在列表中的第二个项目。

由于可能有多个匹配项,我们需要一个列表的列表。

一个更好的例子可能是具有多个匹配/子匹配提取(可能还有一个可选组)的例子:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
	matches := someRegex.FindAllStringSubmatch("somestri", -1)
	fmt.Printf("%q\n", matches)
}

[^aouiye]([aouiye])([^aouiye])?匹配一个非元音字母、一个元音字母和一个非元音字母,并将最后两个分别捕获到组#1和组#2中。

结果是[["som" "o" "m"] ["ri" "i" ""]]。有2个匹配项,每个匹配项包含一个匹配值、组1的值和组2的值。由于ri匹配项在组2(([^aouiye])?)中没有捕获到任何文本,所以它是空的,但它仍然存在,因为该组在正则表达式模式中被定义。

英文:

The func (*Regexp) FindAllStringSubmatch extracts matches and captured submatches.

A submatch is a part of the text that is matched by the regex part that is enclosed with a pair of unescaped parentheses (a so called capturing group).

In your case, ^.*(mes).*$ matches:

  • ^ - start of string
  • .* - any 0+ chars as many as possible
  • (mes) - Capturing group 1: a mes substring
  • .*$ - the rest of the string.

So, the match value is the whole string. It will be the first value in the output. Then, since there is a capturing group, there must be a place for it in the results, hence, mes is placed as the second item in the list.

Since there may be more matches than 1, we need a list of lists.

A better example may be the one with several match / submatch extraction (and maybe an optional group, too):

package main

import (
	&quot;fmt&quot;
	&quot;regexp&quot;
)

func main() {
	someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
	matches := someRegex.FindAllStringSubmatch(&quot;somestri&quot;, -1)
	fmt.Printf(&quot;%q\n&quot;, matches)
}

The [^aouiye]([aouiye])([^aouiye])? matches a non-vowel, a vowel, and a non-vowel, capturing the last 2 into separate groups #1 and #2.

The results are [[&quot;som&quot; &quot;o&quot; &quot;m&quot;] [&quot;ri&quot; &quot;i&quot; &quot;&quot;]]. There are 2 matches, and each contains a match value, Group 1 value and Group 2 value. Since the ri match has no text captured into Group 2 (([^aouiye])?), it is empty, but it is still there since the group is defined in the regex pattern.

答案2

得分: 3

FindAllStringSubmatch是FindStringSubmatch的“All”版本;它返回一个切片,其中包含表达式的所有连续匹配项,如包注释中的“All”描述所定义。返回值为nil表示没有匹配项。

文档

总结一下:你需要一个字符串数组的数组,因为这是FindStringSubmatch的“all”版本。FindStringSubmatch将返回一个单独的字符串数组。

英文:

> FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it
> returns a slice of all successive matches of the expression, as
> defined by the 'All' description in the package comment. A return
> value of nil indicates no match.

Docs.

To sum up: You need an array of arrays of strings, because this is the all version of FindStringSubmatch. FindStringSubmatch will return a single string array.

huangapple
  • 本文由 发表于 2017年8月24日 16:20:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/45856464.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定