2016年11月5日 14:00:43go评论85阅读模式

英文:

How to capture 'multiple' repeated groups with Regular Expressions

问题

我有一个文本文件，我想解析出其中的各个字段：

host_group_web = ( )
host_group_lbnorth = ( lba050 lbhou002 lblon003 )

我想提取的字段用粗体标出：

host_group_web = ( )
host_group_lbnorth = ( lba505 lbhou002 lblon003 )

host_group_web在**（）**之间没有任何项，所以该部分将被忽略。

我将第一组命名为nodegroup，括号中的项命名为nodes。

我逐行读取文件，并将结果存储以供进一步处理。

在Golang中，这是我使用的正则表达式片段：

hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupExp := regexp.MustCompile(`host_group_(?P<nodegroup>[[:alnum:]]+)\s*=\s*\(\s*(?P<nodes>[[:alnum:]]+\s*)`)
hostGroupMatch := hostGroupExp.FindStringSubmatch(hostGroupLine)

for i, name := range hostGroupExp.SubexpNames() {
  if i != 0 {
    fmt.Println("GroupName:", name, "GroupMatch:", hostGroupMatch[i])
  }
}

我得到了以下输出，其中缺少nodes命名组的其余匹配项。

GroupName: nodegroup GroupMatch: lbnorth
GroupName: nodes GroupMatch: lba050

在Golang Playground中的代码片段

我的问题是，如何在Golang中使用正则表达式匹配nodegroup和可能在行中的所有nodes，例如lba050 lbhou002 lblon003。节点的数量将会有所变化，从0个到很多个。

英文:

I have the following text file I would like to parse out to get the individual fields:

host_group_web = ( )
host_group_lbnorth = ( lba050 lbhou002 lblon003 )

The fields that I would like to extract are in bold

host_group_web = ( )
host_group_lbnorth = ( lba505 lbhou002 lblon003 )

host_group_web has no items in between the ( ), so that portion would be ignored

I've named the first group as nodegroup and the items in between the () as nodes

I am reading the file line by line, and storing the results for further processing.

In Golang, This is the snippet of Regex I am using:

hostGroupLine := &quot;host_group_lbnorth = ( lba050 lbhou002 lblon003 )&quot;
hostGroupExp := regexp.MustCompile(`host_group_(?P&lt;nodegroup&gt;[[:alnum:]]+)\s*=\s*\(\s*(?P&lt;nodes&gt;[[:alnum:]]+\s*)`)
hostGroupMatch := hostGroupExp.FindStringSubmatch(hostGroupLine)

for i, name := range hostGroupExp.SubexpNames() {
  if i != 0 {
    fmt.Println(&quot;GroupName:&quot;, name, &quot;GroupMatch:&quot;, hostGroupMatch[i])
  }
}

I get the following output, which is missing the rest of the matches for the nodes named group.

GroupName: nodegroup GroupMatch: lbnorth
GroupName: nodes GroupMatch: lba050

The Snippet in Golang Playground

1: https://play.golang.org/p/oOJ3Ex9aVf "Regex Problem"

My question is, how do I get a Regex in Golang that would match the nodegroup and all the nodes that maybe in the line, e.g lba050 lbhou002 lblon003.
The amount of nodes will vary, from 0 - as many.

答案1

得分: 5

如果你想捕获组名和所有可能的节点名，你应该使用不同的正则表达式模式。这个模式可以一次性捕获所有的内容。不需要使用命名捕获组，但如果你愿意，也可以使用。

hostGroupExp := regexp.MustCompile(`host_group_([[:alnum:]]+)|([[:alnum:]]+) `)

hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupMatch := hostGroupExp.FindAllStringSubmatch(hostGroupLine, -1)

fmt.Printf("GroupName: %s\n", hostGroupMatch[0][1])
for i := 1; i < len(hostGroupMatch); i++ {
    fmt.Printf("  Node: %s\n", hostGroupMatch[i][2])
}

在 playground 中查看示例。

另一种方法：

你也可以像 awk 一样进行解析：使用正则表达式将行拆分为标记，并打印所需的标记。当然，行的布局应与你的示例中给出的布局相同。

package main

import (
    "fmt"
    "regexp"
)

func printGroupName(tokens []string) {
    fmt.Printf("GroupName: %s\n", tokens[2])
    for i := 5; i < len(tokens)-1; i++ {
        fmt.Printf("  Node: %s\n", tokens[i])
    }
}

func main() {

    // 正则表达式行拆分器（使用 _ 或空格）
    r := regexp.MustCompile(`_| `)

    // 要解析的行
    hostGroupLines := []string{
        "host_group_lbnorth = ( lba050 lbhou002 lblon003 )",
        "host_group_web = ( web44 web125 )",
        "host_group_web = ( web44 )",
        "host_group_lbnorth = ( )",
    }

    // 使用正则表达式拆分行并打印结果
    for _, line := range hostGroupLines {
        hostGroupMatch := r.Split(line, -1)
        printGroupName(hostGroupMatch)
    }

}

在 playground 中查看示例。

英文:

If you want to capture the group name and all possible node names, you should work with a different regex pattern. This one should capture all of them in one go. No need to work with named capture groups but you can if you want to.

hostGroupExp := regexp.MustCompile(`host_group_([[:alnum:]]+)|([[:alnum:]]+) `)

hostGroupLine := &quot;host_group_lbnorth = ( lba050 lbhou002 lblon003 )&quot;
hostGroupMatch := hostGroupExp.FindAllStringSubmatch(hostGroupLine, -1)

fmt.Printf(&quot;GroupName: %s\n&quot;, hostGroupMatch[0][1])
for i := 1; i &lt; len(hostGroupMatch); i++ {
    fmt.Printf(&quot;  Node: %s\n&quot;, hostGroupMatch[i][2])
}

See it in action in playground

Alternative:

You can also work the way awk would do the parsing: use a regexp expression to split the lines in tokens and print the tokens you need. Of course the line layout should be the same as the one given in your example.

package main

import (
    &quot;fmt&quot;
    &quot;regexp&quot;
)

func printGroupName(tokens []string) {
    fmt.Printf(&quot;GroupName: %s\n&quot;, tokens[2])
    for i := 5; i &lt; len(tokens)-1; i++ {
        fmt.Printf(&quot;  Node: %s\n&quot;, tokens[i])
    }
}

func main() {

    // regexp line splitter (either _ or space)
    r := regexp.MustCompile(`_| `)

    // lines to parse
    hostGroupLines := []string{
        &quot;host_group_lbnorth = ( lba050 lbhou002 lblon003 )&quot;,
        &quot;host_group_web = ( web44 web125 )&quot;,
        &quot;host_group_web = ( web44 )&quot;,
        &quot;host_group_lbnorth = ( )&quot;,
    }

    // split lines on regexp splitter and print result
    for _, line := range hostGroupLines {
        hostGroupMatch := r.Split(line, -1)
        printGroupName(hostGroupMatch)
    }

}

See it in action in playground

1: https://play.golang.org/p/k4qUc-Qw4y "playground"
2: https://www.gnu.org/software/gawk/manual/gawk.html
3: https://play.golang.org/p/0A1V-d3YrL

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用正则表达式捕获“多个”重复的分组？

问题

答案1

另一种方法：

Alternative:

在Go的正则表达式中，是否没有灾难性回溯？

创建一个没有使用 make 的 Go 切片。

在Golang中，可以使用反射(reflect)以一种通用的方式迭代(slice)切片吗？

将毫秒转换为Golang中的时间。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论