英文:
How to capture 'multiple' repeated groups with Regular Expressions
问题
我有一个文本文件,我想解析出其中的各个字段:
host_group_web = ( )
host_group_lbnorth = ( lba050 lbhou002 lblon003 )
我想提取的字段用粗体标出:
- host_group_web = ( )
- host_group_lbnorth = ( lba505 lbhou002 lblon003 )
host_group_web在**()**之间没有任何项,所以该部分将被忽略。
我将第一组命名为nodegroup,括号中的项命名为nodes。
我逐行读取文件,并将结果存储以供进一步处理。
在Golang中,这是我使用的正则表达式片段:
hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupExp := regexp.MustCompile(`host_group_(?P<nodegroup>[[:alnum:]]+)\s*=\s*\(\s*(?P<nodes>[[:alnum:]]+\s*)`)
hostGroupMatch := hostGroupExp.FindStringSubmatch(hostGroupLine)
for i, name := range hostGroupExp.SubexpNames() {
if i != 0 {
fmt.Println("GroupName:", name, "GroupMatch:", hostGroupMatch[i])
}
}
我得到了以下输出,其中缺少nodes命名组的其余匹配项。
GroupName: nodegroup GroupMatch: lbnorth
GroupName: nodes GroupMatch: lba050
我的问题是,如何在Golang中使用正则表达式匹配nodegroup和可能在行中的所有nodes,例如lba050 lbhou002 lblon003。节点的数量将会有所变化,从0个到很多个。
英文:
I have the following text file I would like to parse out to get the individual fields:
host_group_web = ( )
host_group_lbnorth = ( lba050 lbhou002 lblon003 )
The fields that I would like to extract are in bold
- host_group_web = ( )
- host_group_lbnorth = ( lba505 lbhou002 lblon003 )
host_group_web has no items in between the ( ), so that portion would be ignored
I've named the first group as nodegroup and the items in between the () as nodes
I am reading the file line by line, and storing the results for further processing.
In Golang, This is the snippet of Regex I am using:
hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupExp := regexp.MustCompile(`host_group_(?P<nodegroup>[[:alnum:]]+)\s*=\s*\(\s*(?P<nodes>[[:alnum:]]+\s*)`)
hostGroupMatch := hostGroupExp.FindStringSubmatch(hostGroupLine)
for i, name := range hostGroupExp.SubexpNames() {
if i != 0 {
fmt.Println("GroupName:", name, "GroupMatch:", hostGroupMatch[i])
}
}
I get the following output, which is missing the rest of the matches for the nodes named group.
GroupName: nodegroup GroupMatch: lbnorth
GroupName: nodes GroupMatch: lba050
The Snippet in Golang Playground
1: https://play.golang.org/p/oOJ3Ex9aVf "Regex Problem"
My question is, how do I get a Regex in Golang that would match the nodegroup and all the nodes that maybe in the line, e.g lba050 lbhou002 lblon003.
The amount of nodes will vary, from 0 - as many.
答案1
得分: 5
如果你想捕获组名和所有可能的节点名,你应该使用不同的正则表达式模式。这个模式可以一次性捕获所有的内容。不需要使用命名捕获组,但如果你愿意,也可以使用。
hostGroupExp := regexp.MustCompile(`host_group_([[:alnum:]]+)|([[:alnum:]]+) `)
hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupMatch := hostGroupExp.FindAllStringSubmatch(hostGroupLine, -1)
fmt.Printf("GroupName: %s\n", hostGroupMatch[0][1])
for i := 1; i < len(hostGroupMatch); i++ {
fmt.Printf(" Node: %s\n", hostGroupMatch[i][2])
}
在 playground 中查看示例。
另一种方法:
你也可以像 awk 一样进行解析:使用正则表达式将行拆分为标记,并打印所需的标记。当然,行的布局应与你的示例中给出的布局相同。
package main
import (
"fmt"
"regexp"
)
func printGroupName(tokens []string) {
fmt.Printf("GroupName: %s\n", tokens[2])
for i := 5; i < len(tokens)-1; i++ {
fmt.Printf(" Node: %s\n", tokens[i])
}
}
func main() {
// 正则表达式行拆分器(使用 _ 或空格)
r := regexp.MustCompile(`_| `)
// 要解析的行
hostGroupLines := []string{
"host_group_lbnorth = ( lba050 lbhou002 lblon003 )",
"host_group_web = ( web44 web125 )",
"host_group_web = ( web44 )",
"host_group_lbnorth = ( )",
}
// 使用正则表达式拆分行并打印结果
for _, line := range hostGroupLines {
hostGroupMatch := r.Split(line, -1)
printGroupName(hostGroupMatch)
}
}
在 playground 中查看示例。
英文:
If you want to capture the group name and all possible node names, you should work with a different regex pattern. This one should capture all of them in one go. No need to work with named capture groups but you can if you want to.
hostGroupExp := regexp.MustCompile(`host_group_([[:alnum:]]+)|([[:alnum:]]+) `)
hostGroupLine := "host_group_lbnorth = ( lba050 lbhou002 lblon003 )"
hostGroupMatch := hostGroupExp.FindAllStringSubmatch(hostGroupLine, -1)
fmt.Printf("GroupName: %s\n", hostGroupMatch[0][1])
for i := 1; i < len(hostGroupMatch); i++ {
fmt.Printf(" Node: %s\n", hostGroupMatch[i][2])
}
See it in action in playground
Alternative:
You can also work the way awk would do the parsing: use a regexp expression to split the lines in tokens and print the tokens you need. Of course the line layout should be the same as the one given in your example.
package main
import (
"fmt"
"regexp"
)
func printGroupName(tokens []string) {
fmt.Printf("GroupName: %s\n", tokens[2])
for i := 5; i < len(tokens)-1; i++ {
fmt.Printf(" Node: %s\n", tokens[i])
}
}
func main() {
// regexp line splitter (either _ or space)
r := regexp.MustCompile(`_| `)
// lines to parse
hostGroupLines := []string{
"host_group_lbnorth = ( lba050 lbhou002 lblon003 )",
"host_group_web = ( web44 web125 )",
"host_group_web = ( web44 )",
"host_group_lbnorth = ( )",
}
// split lines on regexp splitter and print result
for _, line := range hostGroupLines {
hostGroupMatch := r.Split(line, -1)
printGroupName(hostGroupMatch)
}
}
See it in action in playground
1: https://play.golang.org/p/k4qUc-Qw4y "playground"
2: https://www.gnu.org/software/gawk/manual/gawk.html
3: https://play.golang.org/p/0A1V-d3YrL
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论