英文:
Golang: Why does regexp.FindAllStringSubmatch() returns [][]string and not []string?
问题
我对Go语言还比较新,这是我第一次处理正则表达式。
我有点惊讶于someregex.FindAllStringSubmatch("somestring", -1)
返回的是一个切片的切片[][]string
,而不是一个简单的字符串切片[]string
。
例如:
someRegex, _ := regexp.Compile("^.*(mes).*$")
matches := someRegex.FindAllStringSubmatch("somestring", -1)
fmt.Println(matches) // 输出 [[somestring mes]]
这种行为的原因是什么,我搞不明白。
英文:
I am kind of new to go and that's the first time I have to deal with regexp.
I am a bit surprised that the someregex.FindAllStringSubmatch("somestring", -1)
returns a slice of slice [][]string
instead of a simple slice of string : []string
.
example :
<!-- language: golang -->
someRegex, _ := regexp.Compile("^.*(mes).*$")
matches := someRegex.FindAllStringSubmatch("somestring", -1)
fmt.Println(matches) // logs [[somestring mes]]
What is the reason of this behavior, I can't figure it out ?
答案1
得分: 8
func (*Regexp) FindAllStringSubmatch
提取匹配项和捕获子匹配项。
子匹配项是由一对未转义括号(称为捕获组)括起来的正则表达式部分匹配的文本的一部分。
在你的情况下,^.*(mes).*$
匹配:
^
- 字符串的开头.*
- 任意0个或多个字符,尽可能多地匹配(mes)
- 捕获组1:一个mes
子字符串.*$
- 字符串的剩余部分。
因此,匹配值是整个字符串。它将是输出中的第一个值。然后,由于有一个捕获组,结果中必须有一个位置给它,因此mes
被放置在列表中的第二个项目。
由于可能有多个匹配项,我们需要一个列表的列表。
一个更好的例子可能是具有多个匹配/子匹配提取(可能还有一个可选组)的例子:
package main
import (
"fmt"
"regexp"
)
func main() {
someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
matches := someRegex.FindAllStringSubmatch("somestri", -1)
fmt.Printf("%q\n", matches)
}
[^aouiye]([aouiye])([^aouiye])?
匹配一个非元音字母、一个元音字母和一个非元音字母,并将最后两个分别捕获到组#1和组#2中。
结果是[["som" "o" "m"] ["ri" "i" ""]]
。有2个匹配项,每个匹配项包含一个匹配值、组1的值和组2的值。由于ri
匹配项在组2(([^aouiye])?
)中没有捕获到任何文本,所以它是空的,但它仍然存在,因为该组在正则表达式模式中被定义。
英文:
The func (*Regexp) FindAllStringSubmatch
extracts matches and captured submatches.
A submatch is a part of the text that is matched by the regex part that is enclosed with a pair of unescaped parentheses (a so called capturing group).
In your case, ^.*(mes).*$
matches:
^
- start of string.*
- any 0+ chars as many as possible(mes)
- Capturing group 1: ames
substring.*$
- the rest of the string.
So, the match value is the whole string. It will be the first value in the output. Then, since there is a capturing group, there must be a place for it in the results, hence, mes
is placed as the second item in the list.
Since there may be more matches than 1, we need a list of lists.
A better example may be the one with several match / submatch extraction (and maybe an optional group, too):
package main
import (
"fmt"
"regexp"
)
func main() {
someRegex, _ := regexp.Compile(`[^aouiye]([aouiye])([^aouiye])?`)
matches := someRegex.FindAllStringSubmatch("somestri", -1)
fmt.Printf("%q\n", matches)
}
The [^aouiye]([aouiye])([^aouiye])?
matches a non-vowel, a vowel, and a non-vowel, capturing the last 2 into separate groups #1 and #2.
The results are [["som" "o" "m"] ["ri" "i" ""]]
. There are 2 matches, and each contains a match value, Group 1 value and Group 2 value. Since the ri
match has no text captured into Group 2 (([^aouiye])?
), it is empty, but it is still there since the group is defined in the regex pattern.
答案2
得分: 3
FindAllStringSubmatch是FindStringSubmatch的“All”版本;它返回一个切片,其中包含表达式的所有连续匹配项,如包注释中的“All”描述所定义。返回值为nil表示没有匹配项。
文档。
总结一下:你需要一个字符串数组的数组,因为这是FindStringSubmatch的“all”版本。FindStringSubmatch将返回一个单独的字符串数组。
英文:
> FindAllStringSubmatch is the 'All' version of FindStringSubmatch; it
> returns a slice of all successive matches of the expression, as
> defined by the 'All' description in the package comment. A return
> value of nil indicates no match.
Docs.
To sum up: You need an array of arrays of strings, because this is the all version of FindStringSubmatch. FindStringSubmatch will return a single string array.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论