英文:
Go regex - error parsing regexp: invalid escape sequence: `\K`
问题
我正在尝试编译一个正则表达式,以便在使用Go语言时从字符串中提取一个带有/不带有数字之间的空格的8位数字。由于某些原因,编译失败了。我应该用什么替换K?
validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
return
}
更多带有示例数据的代码:
package main
import "strings"
import "regexp"
import "fmt"
func main() {
msg := ` 12 34 56 78 //the number we need
12 3455678 90123455 // the number we don't need`
acc, err := accFromText(msg)
if err != nil {
panic(err)
}
exAcc := "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need
12 3456 78 //this is the kind of number we need
12 3455678 90123455 // the number we don't need`
acc, err = accFromText(msg)
if err != nil {
panic(err)
}
exAcc = "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
}
func accFromText(msg string) (accNumber string, err error) {
validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
return
}
accNumber = string(validAcc.Find([]byte(msg)))
accNumber = strings.Replace(accNumber, " ", "", -1)
return
}
你可以在这里运行它:Play it here
英文:
I'm trying to compile a regex so that I can extract an 8 digit number with/without spaces between the digits from a string using Go. For some reasons the compilations fails. What should I repalce K with ?
validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
return
}
More code with sample data
package main
import "strings"
import "regexp"
import "fmt"
func main() {
msg := ` 12 34 56 78 //the number we need
12 3455678 90123455 // the number we don't need`
acc, err := accFromText(msg)
if err != nil {
panic(err)
}
exAcc := "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need
12 3456 78 //this is the kind of number we need
12 3455678 90123455 // the number we don't need`
acc, err = accFromText(msg)
if err != nil {
panic(err)
}
exAcc = "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
}
func accFromText(msg string) (accNumber string, err error) {
validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
return
}
accNumber = string(validAcc.Find([]byte(msg)))
accNumber = strings.Replace(accNumber, " ", "", -1)
return
}
答案1
得分: 4
考虑到go regexp r2不支持任何lookbehind/ahead,你可以先尝试一个更简单的表达式:
c, err := regexp.Compile(`\b\d{8}\b`)
在你的情况下(playground),这个表达式可以工作:
(\d\d ){4}
validAcc, err := regexp.Compile(`(\d\d ){4}`)
或者:
(\d\d ?){4} # 匹配 '33 1133 06 Oth'
validAcc, err := regexp.Compile(`(\d\d ?){4}`)
再次强调,在尝试更复杂的选项之前,先尝试一个简单的正则表达式,这将取决于你需要解析的数据。
对于更复杂的情况,正则表达式本身可以帮助你捕获数据并将其放入一个组中,然后你需要提取找到的数字(这意味着你需要在正则表达式之后添加后处理):
validAcc, err := regexp.Compile(`[^\d]((\d\d ?){4})[^\d]`)
if err != nil {
return
}
accNumber = string(validAcc.Find([]byte(msg)))[1:]
accNumber = accNumber[:len(accNumber)-1]
accNumber = strings.Replace(accNumber, " ", "", -1)
参见playground。
英文:
Considering the go regexp r2 doesn't support any lookbehind/ahead, could you try a simpler expression first:
c, err := regexp.Compile(`\b\d{8}\b`)
In your case (playground), this would work
(\d\d ){4}
validAcc, err := regexp.Compile(`(\d\d ){4}`)
Or:
(\d\d ?){4} # matches '33 1133 06 Oth'
validAcc, err := regexp.Compile(`(\d\d ?){4}`)
Again, I try first a simple regexp, before trying more complex option: it will depend on the data you have to parse.
For a more complex case, the regexp alone can help you capture the data in a group, and then you need to extract the number found (meaning you ned to add post-processing to your regexp):
validAcc, err := regexp.Compile(`[^\d]((\d\d ?){4})[^\d]`)
if err != nil {
return
}
accNumber = string(validAcc.Find([]byte(msg)))[1:]
accNumber = accNumber[:len(accNumber)-1]
accNumber = strings.Replace(accNumber, " ", "", -1)
See playground
答案2
得分: 1
这将完成任务(更快:不需要任何正则表达式)
package main
import "fmt"
import "unicode"
import "strings"
func main() {
msg := ` 12 34 56 78 //the number we need
12 3455678 90123455 // the number we don't need`
acc, err := accFromText(msg)
if err != nil {
panic(err)
}
exAcc := "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need
12 3456 78 //this is the kind of number we need
12 3455678 90123455 // the number we don't need`
acc, err = accFromText(msg)
if err != nil {
panic(err)
}
exAcc = "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
}
func accFromText(msg string) (accNumber string, err error) {
// split msg into lines
lines := strings.FieldsFunc(msg, func(c rune) bool {
return unicode.IsControl(c)
})
// filter numbers
fn := func(ln string) (num string) {
for _, c := range []rune(ln) {
if unicode.IsNumber(c) {
num += string(c)
// fmt.Println(num)
} else if !unicode.IsSpace(c) {
return num
}
}
return num
}
for _, line := range lines {
num := fn(line)
if len(num) == 8 { // 8 numbers in line is the kriterium to accept
return num, nil
}
}
return "eee", nil // Note: Change this later; it's only needed to satisfy func calls above
}
http://play.golang.org/p/yVDgDWO9hE
英文:
This will do the job (faster: without any regexp need)
package main
import "fmt"
import "unicode"
import "strings"
func main() {
msg := ` 12 34 56 78 //the number we need
12 3455678 90123455 // the number we don't need`
acc, err := accFromText(msg)
if err != nil {
panic(err)
}
exAcc := "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need
12 3456 78 //this is the kind of number we need
12 3455678 90123455 // the number we don't need`
acc, err = accFromText(msg)
if err != nil {
panic(err)
}
exAcc = "12345678"
if acc != exAcc {
fmt.Printf("expected %v, received %v", exAcc, acc)
}
}
func accFromText(msg string) (accNumber string, err error) {
// split msg into lines
lines := strings.FieldsFunc(msg, func(c rune) bool {
return unicode.IsControl(c)
})
// filter numbers
fn := func(ln string) (num string) {
for _, c := range []rune(ln) {
if unicode.IsNumber(c) {
num += string(c)
// fmt.Println(num)
} else if !unicode.IsSpace(c) {
return num
}
}
return num
}
for _, line := range lines {
num := fn(line)
if len(num) == 8 { // 8 numbers in line is the kriterium to accept
return num, nil
}
}
return "eee", nil // Note: Change this later; it's only needed to satisfy func calls above
}
答案3
得分: 0
我建议你采取以下两个步骤:
1)使用正则表达式找到所有匹配项:\d[\d ]+\d
2)筛选出包含8位数字的匹配项
(我认为在golang中无法通过单个正则表达式完成这个任务)
英文:
I suggest you take two steps:
-
use regexp find all matches:
\d[\d ]+\d
-
filter out which contains 8 digits
(I don’t think you can do this by a single regex in golang)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论