How to split string in Go based on certain prefix and suffix?

huangapple go评论74阅读模式
英文:

How to split string in Go based on certain prefix and suffix?

问题

假设我有这个大字符串:

>13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a

我想将它拆分成一个数组,以1324作为前缀,0d0a作为后缀。结果是一个包含3个元素的数组:

> arr[0] = 13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a

> arr1 = 13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a

> arr[2] = 1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a

以下是我的代码:

package main

import (
	"fmt"
	"regexp"
)

func main() {

	var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"

	re := regexp.MustCompile("^1324[0-9a-zA-Z]*0d0a")

	matches := re.FindAllString(testData, -1)

	for _, m := range matches {
		fmt.Printf("%s\n", m)
	}
}

它只是打印出相同的整个字符串,这很可能意味着我的正则表达式是错误的。正确的形式是什么?

英文:

Let's say I have this big string:

>13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a

I want it to be splitted into array, with 1324 as prefix and 0d0a as suffix. The result is an array of 3 elements:

> arr[0] = 13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a

> arr1 = 13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a

> arr[2] = 1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a

Here's my code:

package main

import (
	"fmt"
	"regexp"
)

func main() {

	var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"

	re := regexp.MustCompile("^1324[0-9a-zA-Z]*0d0a")

	matches := re.FindAllString(testData, -1)

	for _, m := range matches {
		fmt.Printf("%s\n", m)
	}
}

It simply prints the same entire string, which very likely means my regex is wrong. What's the proper form?

答案1

得分: 3

你的正则表达式有两个问题。插入符号(^)表示你只想匹配字符串的开头,所以根据定义,你只会得到一个结果。另一个问题是星号(*)是一个贪婪量词,意味着它会尽可能匹配前面的字符集。这意味着正则表达式会一直搜索字符串的结尾以找到后缀,并且只有在找不到时才会回溯。你需要的是一个勉强量词,即*?,它只匹配满足正则表达式的最少字符数。

综合起来,你的正则表达式应该是"1324[0-9a-zA-Z]*?0d0a"。我在 Go Playground 上测试了一下,似乎可以得到你想要的结果。https://go.dev/play/p/qolk3vHNxKT

英文:

Your regex has a two issues. The caret (^) means you want to only match the beginning of the string, so by definition you will only get one result. The other issue is that the * is a greedy quantifier, meaning it will match as many of the previous character set as it can. This means the regex will search until the end of the string for the suffix and backtrack only if it can't find it. What you want is a reluctant quantifier, so *?, which only matches the minimum number of characters it can to satisfy the regex.

Putting it together, your regex string should be "1324[0-9a-zA-Z]*?0d0a". I tested it in Go playground and it seems to get the results that you want. https://go.dev/play/p/qolk3vHNxKT

答案2

得分: 1

使用strings.Split函数对关键字1324进行拆分,然后将其添加到每个条目之前,这样会更简单。

results类型是一个由提供的分隔符拆分的字符串切片。遍历它一次,将分隔符添加到每个条目前面,以获得所需的结果。

注意,在我的 M1 MacBook Pro 上,使用Split()示例在运行 Go 的基准测试时比正则表达式选项表现得更好。

package main

import (
	"fmt"
	"strings"
)

func main() {
	var output []string
	var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
	results := strings.Split(testData, "1324")
	for idx := range results {
		if len(results[idx]) > 0 {
			output = append(output, fmt.Sprintf("%s%s", "1324", results[idx]))
		}
	}
}

请注意,在我的 M1 MacBook Pro 上,使用Split()示例在运行 Go 的基准测试时比正则表达式选项表现得更好。

英文:

It will be much simpler to use strings.Split on the keyword 1324 and then later prefix it to each entry.

The results type is a slice of strings each split by the delimiter provided. Iterate over it once to prefix the delimiter to get the desired result

package main

import (
	"fmt"
	"strings"
)

func main() {
	var output []string
	var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
	results := strings.Split(testData, "1324")
	for idx := range results {
		if len(results[idx]) > 0 {
			output = append(output, fmt.Sprintf("%s%s", "1324", results[idx]))
		}
	}
}

Note that on my M1 Macbook Pro, the Split() example performed far better than the regex option, when ran with with Go's benchmarks.

huangapple
  • 本文由 发表于 2022年10月6日 13:08:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/73968877.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定