Go正则表达式 – 解析正则表达式时出错:无效的转义序列:`\K`

huangapple go评论89阅读模式
英文:

Go regex - error parsing regexp: invalid escape sequence: `\K`

问题

我正在尝试编译一个正则表达式,以便在使用Go语言时从字符串中提取一个带有/不带有数字之间的空格的8位数字。由于某些原因,编译失败了。我应该用什么替换K?

validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
    return
}

更多带有示例数据的代码:

package main

import "strings"
import "regexp"
import "fmt"

func main() {

    msg := ` 12 34 56 78 //the number we need
     12 3455678 90123455 // the number we don't need`

    acc, err := accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc := "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

    msg = `
    More details here
    1234567 12345 123456789 asd
    12000000000 a number we don't need 
     12 3456 78 //this is the kind of number we need
     12 3455678 90123455 // the number we don't need`

    acc, err = accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc = "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

}

func accFromText(msg string) (accNumber string, err error) {
    validAcc, err := regexp.Compile(`[ ]\K(?<!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
    if err != nil {
        return
    }
    accNumber = string(validAcc.Find([]byte(msg)))
    accNumber = strings.Replace(accNumber, " ", "", -1)
    return
}

你可以在这里运行它:Play it here

英文:

I'm trying to compile a regex so that I can extract an 8 digit number with/without spaces between the digits from a string using Go. For some reasons the compilations fails. What should I repalce K with ?

validAcc, err := regexp.Compile(`[ ]\K(?&lt;!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
if err != nil {
	return
}

Play it here

More code with sample data

package main

import &quot;strings&quot;
import &quot;regexp&quot;
import &quot;fmt&quot;

func main() {

	msg := ` 12 34 56 78 //the number we need
 12 3455678 90123455 // the number we don&#39;t need`

	acc, err := accFromText(msg)
	if err != nil {
		panic(err)
	}
	exAcc := &quot;12345678&quot;
	if acc != exAcc {
		fmt.Printf(&quot;expected %v, received %v&quot;, exAcc, acc)
	}

	msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don&#39;t need 
 12 3456 78 //this is the kind of number we need
 12 3455678 90123455 // the number we don&#39;t need`

	acc, err = accFromText(msg)
	if err != nil {
		panic(err)
	}
	exAcc = &quot;12345678&quot;
	if acc != exAcc {
		fmt.Printf(&quot;expected %v, received %v&quot;, exAcc, acc)
	}

}

func accFromText(msg string) (accNumber string, err error) {
	validAcc, err := regexp.Compile(`[ ]\K(?&lt;!\d )(?=(?: ?\d){8})(?!(?: ?\d){9})\d[ \d]+\d`)
	if err != nil {
		return
	}
	accNumber = string(validAcc.Find([]byte(msg)))
	accNumber = strings.Replace(accNumber, &quot; &quot;, &quot;&quot;, -1)
	return
}

答案1

得分: 4

考虑到go regexp r2不支持任何lookbehind/ahead,你可以先尝试一个更简单的表达式:

c, err := regexp.Compile(`\b\d{8}\b`)

在你的情况下(playground),这个表达式可以工作:

(\d\d ){4}
validAcc, err := regexp.Compile(`(\d\d ){4}`)

或者:

(\d\d ?){4} # 匹配 '33 1133 06 Oth'
validAcc, err := regexp.Compile(`(\d\d ?){4}`)

再次强调,在尝试更复杂的选项之前,先尝试一个简单的正则表达式,这将取决于你需要解析的数据。


对于更复杂的情况,正则表达式本身可以帮助你捕获数据并将其放入一个组中,然后你需要提取找到的数字(这意味着你需要在正则表达式之后添加后处理):

validAcc, err := regexp.Compile(`[^\d]((\d\d ?){4})[^\d]`)
if err != nil {
    return
}
accNumber = string(validAcc.Find([]byte(msg)))[1:]
accNumber = accNumber[:len(accNumber)-1]
accNumber = strings.Replace(accNumber, " ", "", -1)

参见playground

英文:

Considering the go regexp r2 doesn't support any lookbehind/ahead, could you try a simpler expression first:

c, err := regexp.Compile(`\b\d{8}\b`)

In your case (playground), this would work

(\d\d ){4}
validAcc, err := regexp.Compile(`(\d\d ){4}`)

Or:

(\d\d ?){4} # matches &#39;33 1133 06 Oth&#39;
validAcc, err := regexp.Compile(`(\d\d ?){4}`)

Again, I try first a simple regexp, before trying more complex option: it will depend on the data you have to parse.


For a more complex case, the regexp alone can help you capture the data in a group, and then you need to extract the number found (meaning you ned to add post-processing to your regexp):

validAcc, err := regexp.Compile(`[^\d]((\d\d ?){4})[^\d]`)
if err != nil {
	return
}
accNumber = string(validAcc.Find([]byte(msg)))[1:]
accNumber = accNumber[:len(accNumber)-1]
accNumber = strings.Replace(accNumber, &quot; &quot;, &quot;&quot;, -1)

See playground

答案2

得分: 1

这将完成任务(更快:不需要任何正则表达式)

package main

import "fmt"
import "unicode"
import "strings"

func main() {

    msg := ` 12 34 56 78 //the number we need
 12 3455678 90123455 // the number we don't need`

    acc, err := accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc := "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

    msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don't need 
 12 3456 78 //this is the kind of number we need
 12 3455678 90123455 // the number we don't need`

    acc, err = accFromText(msg)
    if err != nil {
        panic(err)
    }
    exAcc = "12345678"
    if acc != exAcc {
        fmt.Printf("expected %v, received %v", exAcc, acc)
    }

}

func accFromText(msg string) (accNumber string, err error) {
    // split msg into lines
    lines := strings.FieldsFunc(msg, func(c rune) bool {
        return unicode.IsControl(c)
    })

    // filter numbers
    fn := func(ln string) (num string) {
        for _, c := range []rune(ln) {
            if unicode.IsNumber(c) {
                num += string(c)
                // fmt.Println(num)
            } else if !unicode.IsSpace(c) {
                return num
            }
        }
        return num
    }

    for _, line := range lines {
        num := fn(line)
        if len(num) == 8 {  // 8 numbers in line is the kriterium to accept
            return num, nil
        }
    }
    return "eee", nil  // Note: Change this later; it's only needed to satisfy func calls above
}

http://play.golang.org/p/yVDgDWO9hE

英文:

This will do the job (faster: without any regexp need)

	package main
import &quot;fmt&quot;
import &quot;unicode&quot;
import &quot;strings&quot;
func main() {
msg := ` 12 34 56 78 //the number we need
12 3455678 90123455 // the number we don&#39;t need`
acc, err := accFromText(msg)
if err != nil {
panic(err)
}
exAcc := &quot;12345678&quot;
if acc != exAcc {
fmt.Printf(&quot;expected %v, received %v&quot;, exAcc, acc)
}
msg = `
More details here
1234567 12345 123456789 asd
12000000000 a number we don&#39;t need 
12 3456 78 //this is the kind of number we need
12 3455678 90123455 // the number we don&#39;t need`
acc, err = accFromText(msg)
if err != nil {
panic(err)
}
exAcc = &quot;12345678&quot;
if acc != exAcc {
fmt.Printf(&quot;expected %v, received %v&quot;, exAcc, acc)
}
}
func accFromText(msg string) (accNumber string, err error) {
// split msg into lines
lines := strings.FieldsFunc(msg, func(c rune) bool {
return unicode.IsControl(c)
})
// filter numbers
fn := func(ln string) (num string) {
for _, c := range []rune(ln) {
if unicode.IsNumber(c) {
num += string(c)
// fmt.Println(num)
} else if !unicode.IsSpace(c) {
return num
}
}
return num
}
for _, line := range lines {
num := fn(line)
if len(num) == 8 {  // 8 numbers in line is the kriterium to accept
return num, nil
}
}
return &quot;eee&quot;, nil  // Note: Change this later; it&#39;s only needed to satisfy func calls above
}

http://play.golang.org/p/yVDgDWO9hE

答案3

得分: 0

我建议你采取以下两个步骤:

1)使用正则表达式找到所有匹配项:\d[\d ]+\d

2)筛选出包含8位数字的匹配项

(我认为在golang中无法通过单个正则表达式完成这个任务)

英文:

I suggest you take two steps:

  1. use regexp find all matches: \d[\d ]+\d

  2. filter out which contains 8 digits

(I don’t think you can do this by a single regex in golang)

huangapple
  • 本文由 发表于 2014年7月19日 12:53:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/24836885.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定