Golang正则表达式命名分组和子匹配

huangapple go评论76阅读模式
英文:

Golang Regexp Named Groups and Submatches

问题

我正在尝试匹配一个正则表达式,并获取匹配项的捕获组名称。当正则表达式只匹配字符串一次时,这是有效的,但如果它多次匹配字符串,SubexpNames 不会返回重复的名称。

这是一个示例:

package main

import (
	"fmt"
	"regexp"
)

func main() {
	re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) `)
	fmt.Printf("%q\n", re.SubexpNames())
	fmt.Printf("%q\n", re.FindAllStringSubmatch("Alan Turing ", -1))
}

输出结果为:

["" "first"]
[["Alan " "Alan"] ["Turing " "Turing"]]

是否有可能获取每个子匹配的捕获组名称?

英文:

I am trying to match a regular expression and get the capturing group name for the match. This works when the regular expression only matches the string once, but if it matches the string more than once, SubexpNames doesn't return the duplicated names.

Here's an example:

package main

import (
    &quot;fmt&quot;
    &quot;regexp&quot;
)

func main() {
    re := regexp.MustCompile(&quot;(?P&lt;first&gt;[a-zA-Z]+) &quot;)
    fmt.Printf(&quot;%q\n&quot;, re.SubexpNames())
    fmt.Printf(&quot;%q\n&quot;, re.FindAllStringSubmatch(&quot;Alan Turing &quot;, -1))
}

The output is:

[&quot;&quot; &quot;first&quot;]
[[&quot;Alan &quot; &quot;Alan&quot;] [&quot;Turing &quot; &quot;Turing&quot;]]

Is it possible to get the capturing group name for each submatch?

答案1

得分: 12

组名和职位是固定的:

re := regexp.MustCompile("(?P<first>[a-zA-Z]+) ")
groupNames := re.SubexpNames()
for matchNum, match := range re.FindAllStringSubmatch("Alan Turing ", -1) {
	for groupIdx, group := range match {
		name := groupNames[groupIdx]
		if name == "" {
			name = "*"
		}
		fmt.Printf("#%d 文本: '%s', 组: '%s'\n", matchNum, group, name)
	}
}
英文:

Group names and positions are fixed:

re := regexp.MustCompile(&quot;(?P&lt;first&gt;[a-zA-Z]+) &quot;)
groupNames := re.SubexpNames()
for matchNum, match := range re.FindAllStringSubmatch(&quot;Alan Turing &quot;, -1) {
	for groupIdx, group := range match {
		name := groupNames[groupIdx]
		if name == &quot;&quot; {
			name = &quot;*&quot;
		}
		fmt.Printf(&quot;#%d text: &#39;%s&#39;, group: &#39;%s&#39;\n&quot;, matchNum, group, name)
	}
}

答案2

得分: 8

这可能会包含在Go 1.14中(2020年第一季度,尚未确认)。
参见"proposal: regexp: add (*Regexp).SubexpIndex #32420"。更新:它已经包含在Go 1.15中(2020年8月)的commit 782fcb4中。

// SubexpIndex返回具有给定名称的第一个子表达式的索引,
// 如果没有具有该名称的子表达式,则返回-1。
//
// 请注意,可以使用相同的名称编写多个子表达式,例如
// (?P<bob>a+)(?P<bob>b+),它声明了两个名为"bob"的子表达式。
// 在这种情况下,SubexpIndex返回正则表达式中最左边的这种子表达式的索引。
func (*Regexp) SubexpIndex(name string) int

这在CL 187919中讨论。

re := regexp.MustCompile(`(?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+)`)
fmt.Println(re.MatchString("Alan Turing"))
matches := re.FindStringSubmatch("Alan Turing")
lastIndex := re.SubexpIndex("last")
fmt.Printf("last => %d\n", lastIndex)
fmt.Println(matches[lastIndex])

// 输出:
// true
// last => 2
// Turing
英文:

That might be included in Go 1.14 (Q1 2020, not yet confirmed).
See "proposal: regexp: add (*Regexp).SubexpIndex #32420". Update: it has been included with commit 782fcb4 in Go 1.15 (August 2020).

// SubexpIndex returns the index of the first subexpression with the given name,
// or else -1 if there is no subexpression with that name.
//
// Note that multiple subexpressions can be written using the same name, as in
// (?P&lt;bob&gt;a+)(?P&lt;bob&gt;b+), which declares two subexpressions named &quot;bob&quot;.
// In this case SubexpIndex returns the index of the leftmost such subexpression
// in the regular expression.
func (*Regexp) SubexpIndex(name string) int

This is discussed in CL 187919.

re := regexp.MustCompile(`(?P&lt;first&gt;[a-zA-Z]+) (?P&lt;last&gt;[a-zA-Z]+)`)
fmt.Println(re.MatchString(&quot;Alan Turing&quot;))
matches := re.FindStringSubmatch(&quot;Alan Turing&quot;)
lastIndex := re.SubexpIndex(&quot;last&quot;)
fmt.Printf(&quot;last =&gt; %d\n&quot;, lastIndex)
fmt.Println(matches[lastIndex])

// Output:
// true
// last =&gt; 2
// Turing

答案3

得分: 1

在Linux操作系统下执行ping命令并解析输出的示例。

type Result struct {
    AvgTime     time.Duration
    MaxTime     time.Duration
    MinTime     time.Duration
    MDevTime    time.Duration
    Transmitted int
    Received    int
}

func PingHostOrIp(hostOrIp string, pingCount int, timeout time.Duration) (*Result, error) {
    timeoutSec := int(timeout.Seconds())
    outBuff, err := exec.Command("ping", hostOrIp, "-q", fmt.Sprintf("-c %d", pingCount), fmt.Sprintf("-w %d", timeoutSec)).Output()
    if err != nil {
        return nil, err
    }
    out := string(outBuff)
    reg := regexp.MustCompile(`(\d+) packets transmitted, (\d+) received, \d+% packet loss, time .+\nrtt min/avg/max/mdev = ([\d.]+)/([\d.]+)/([\d.]+)/([\d.]+) ms`)
    subMatches := reg.FindStringSubmatch(out)
    if subMatches == nil {
        return nil, errors.New(out)
    }
    res := Result{
        AvgTime:     toDuration(subMatches[4]),
        MaxTime:     toDuration(subMatches[5]),
        MinTime:     toDuration(subMatches[3]),
        MDevTime:    toDuration(subMatches[6]),
        Transmitted: toInt(subMatches[1]),
        Received:    toInt(subMatches[2]),
    }
    return &res, nil
}

func toInt(str string) int {
    i, err := strconv.Atoi(str)
    if err != nil {
        panic(err)
    }
    return i
}

func toDuration(str string) time.Duration {
    f, err := strconv.ParseFloat(str, 32)
    if err != nil {
        panic(err)
    }
    return time.Duration(100*f) * time.Microsecond
}

这段代码展示了在Linux操作系统下执行ping命令并解析输出的示例。它定义了一个Result结构体,其中包含了ping命令的结果信息。PingHostOrIp函数接受主机名或IP地址、ping次数和超时时间作为参数,然后使用exec.Command函数执行ping命令,并通过正则表达式解析输出结果。最后,将解析得到的结果存储在Result结构体中并返回。

英文:

An example of executing ping command under Linux OS with output parsing.

type Result struct {
AvgTime     time.Duration
MaxTime     time.Duration
MinTime     time.Duration
MDevTime    time.Duration
Transmitted int
Received    int
}
func PingHostOrIp(hostOrIp string, pingCount int, timeout time.Duration) (*Result, error) {
timeoutSec := int(timeout.Seconds())
outBuff, err := exec.Command(&quot;ping&quot;, hostOrIp, &quot;-q&quot;, fmt.Sprintf(&quot;-c %d&quot;, pingCount), fmt.Sprintf(&quot;-w %d&quot;, timeoutSec)).Output()
if err != nil {
return nil, err
}
out := string(outBuff)
reg := regexp.MustCompile(`(\d+) packets transmitted, (\d+) received, \d+% packet loss, time .+\nrtt min/avg/max/mdev = ([\d.]+)/([\d.]+)/([\d.]+)/([\d.]+) ms`)
subMatches := reg.FindStringSubmatch(out)
if subMatches == nil {
return nil, errors.New(out)
}
res := Result{
AvgTime:     toDuration(subMatches[4]),
MaxTime:     toDuration(subMatches[5]),
MinTime:     toDuration(subMatches[3]),
MDevTime:    toDuration(subMatches[6]),
Transmitted: toInt(subMatches[1]),
Received:    toInt(subMatches[2]),
}
return &amp;res, nil
}
func toInt(str string) int {
i, err := strconv.Atoi(str)
if err != nil {
panic(err)
}
return i
}
func toDuration(str string) time.Duration {
f, err := strconv.ParseFloat(str, 32)
if err != nil {
panic(err)
}
return time.Duration(100*f) * time.Microsecond
}

huangapple
  • 本文由 发表于 2016年1月22日 22:55:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/34949554.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定