英文:
How to split string in Go based on certain prefix and suffix?
问题
假设我有这个大字符串:
>13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a
我想将它拆分成一个数组,以1324作为前缀,0d0a作为后缀。结果是一个包含3个元素的数组:
> arr[0] = 13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a
> arr1 = 13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a
> arr[2] = 1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a
以下是我的代码:
package main
import (
"fmt"
"regexp"
)
func main() {
var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
re := regexp.MustCompile("^1324[0-9a-zA-Z]*0d0a")
matches := re.FindAllString(testData, -1)
for _, m := range matches {
fmt.Printf("%s\n", m)
}
}
它只是打印出相同的整个字符串,这很可能意味着我的正则表达式是错误的。正确的形式是什么?
英文:
Let's say I have this big string:
>13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a
I want it to be splitted into array, with 1324 as prefix and 0d0a as suffix. The result is an array of 3 elements:
> arr[0] = 13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a
> arr1 = 13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a
> arr[2] = 1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a
Here's my code:
package main
import (
"fmt"
"regexp"
)
func main() {
var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
re := regexp.MustCompile("^1324[0-9a-zA-Z]*0d0a")
matches := re.FindAllString(testData, -1)
for _, m := range matches {
fmt.Printf("%s\n", m)
}
}
It simply prints the same entire string, which very likely means my regex is wrong. What's the proper form?
答案1
得分: 3
你的正则表达式有两个问题。插入符号(^
)表示你只想匹配字符串的开头,所以根据定义,你只会得到一个结果。另一个问题是星号(*
)是一个贪婪量词,意味着它会尽可能匹配前面的字符集。这意味着正则表达式会一直搜索字符串的结尾以找到后缀,并且只有在找不到时才会回溯。你需要的是一个勉强量词,即*?
,它只匹配满足正则表达式的最少字符数。
综合起来,你的正则表达式应该是"1324[0-9a-zA-Z]*?0d0a"
。我在 Go Playground 上测试了一下,似乎可以得到你想要的结果。https://go.dev/play/p/qolk3vHNxKT
英文:
Your regex has a two issues. The caret (^
) means you want to only match the beginning of the string, so by definition you will only get one result. The other issue is that the *
is a greedy quantifier, meaning it will match as many of the previous character set as it can. This means the regex will search until the end of the string for the suffix and backtrack only if it can't find it. What you want is a reluctant quantifier, so *?
, which only matches the minimum number of characters it can to satisfy the regex.
Putting it together, your regex string should be "1324[0-9a-zA-Z]*?0d0a"
. I tested it in Go playground and it seems to get the results that you want. https://go.dev/play/p/qolk3vHNxKT
答案2
得分: 1
使用strings.Split
函数对关键字1324
进行拆分,然后将其添加到每个条目之前,这样会更简单。
results
类型是一个由提供的分隔符拆分的字符串切片。遍历它一次,将分隔符添加到每个条目前面,以获得所需的结果。
注意,在我的 M1 MacBook Pro 上,使用Split()
示例在运行 Go 的基准测试时比正则表达式选项表现得更好。
package main
import (
"fmt"
"strings"
)
func main() {
var output []string
var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
results := strings.Split(testData, "1324")
for idx := range results {
if len(results[idx]) > 0 {
output = append(output, fmt.Sprintf("%s%s", "1324", results[idx]))
}
}
}
请注意,在我的 M1 MacBook Pro 上,使用Split()
示例在运行 Go 的基准测试时比正则表达式选项表现得更好。
英文:
It will be much simpler to use strings.Split on the keyword 1324
and then later prefix it to each entry.
The results
type is a slice of strings each split by the delimiter provided. Iterate over it once to prefix the delimiter to get the desired result
package main
import (
"fmt"
"strings"
)
func main() {
var output []string
var testData = "13242222160a06032c06cf00ca5c160bdc70102dfe0a12bc00a3b101000000cd01d60d0a13242222160a06032c0ccf00ca5bf10bdc74d029d05401fe0a12bc00a3b101000000d1e4270d0a1324222160a06032c1e0a12bc00a3b101000000d233ed0d0a"
results := strings.Split(testData, "1324")
for idx := range results {
if len(results[idx]) > 0 {
output = append(output, fmt.Sprintf("%s%s", "1324", results[idx]))
}
}
}
Note that on my M1 Macbook Pro, the Split()
example performed far better than the regex option, when ran with with Go's benchmarks.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论