在Go语言中比较字符串

huangapple go评论127阅读模式
英文:

Comparing strings in Go

问题

我正在尝试找到字符串中命名捕获组的开始,以创建一个简单的解析器(参见相关问题)。为了做到这一点,extract函数会将last4变量中的最后四个字符记住。如果最后四个字符等于“(?P<”,那么它就是一个捕获组的开始:

  1. package main
  2. import "fmt"
  3. const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`
  4. func main() {
  5. extract(sample)
  6. }
  7. func extract(regex string) {
  8. last4 := new([4]int32)
  9. for _, c := range regex {
  10. last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
  11. last4String := fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])
  12. if last4String == "(?P&lt;" {
  13. fmt.Print("捕获组的开始")
  14. }
  15. }
  16. }

http://play.golang.org/p/pqA-wCuvux

但是这段代码什么都没有打印!last4String == "(?P&lt;"永远不为真,尽管如果我在循环内打印last4String,这个子字符串会出现在输出中。那么如何在Go中比较字符串呢?

还有没有比fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])更优雅的将int32数组转换为字符串的方法?

还有其他可以改进的地方吗?我的代码在我看来有些不够优雅。

英文:

I'm trying to find the begin of a named capturing groups in a string to create a simple parser (see related question). To do this the extract function remembers the last for characters in the last4 variable. If the last 4 characters are equal to "(?P<" it is the beginning of a capturing group:

  1. package main
  2. import &quot;fmt&quot;
  3. const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`
  4. func main() {
  5. extract(sample)
  6. }
  7. func extract(regex string) {
  8. last4 := new([4]int32)
  9. for _, c := range regex {
  10. last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
  11. last4String := fmt.Sprintf(&quot;%c%c%c%c\n&quot;, last4[0], last4[1], last4[2], last4[3])
  12. if last4String == &quot;(?P&lt;&quot; {
  13. fmt.Print(&quot;start of capturing group&quot;)
  14. }
  15. }
  16. }

http://play.golang.org/p/pqA-wCuvux

But this code prints nothing! last4String == &quot;(?P&lt;&quot; is never true, although this substrin appears in the output if I print last4String inside the loop. How to compare strings in Go then?

And is there a more elegant way to convert an int32 array to a string than fmt.Sprintf(&quot;%c%c%c%c\n&quot;, last4[0], last4[1], last4[2], last4[3])?

Anything else that could be better? My code looks somewhat inelegant to me.

答案1

得分: 3

如果不是为了自我教育或类似的目的,你可能想要使用标准库中现有的RE解析器,然后“遍历”AST来执行所需的操作。

  1. func Parse(s string, flags Flags) (*Regexp, error)

Parse解析一个正则表达式字符串s,由指定的Flags控制,并返回一个正则表达式解析树。语法在包regexp的顶级注释中有描述。

甚至还有一个辅助函数可以完成你的任务。

EDIT1: 修复了你的代码:

  1. package main
  2. import "fmt"
  3. const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`
  4. func main() {
  5. extract(sample)
  6. }
  7. func extract(regex string) {
  8. var last4 [4]int32
  9. for _, c := range regex {
  10. last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
  11. last4String := fmt.Sprintf("%c%c%c%c", last4[0], last4[1], last4[2], last4[3])
  12. if last4String == "(?P<" {
  13. fmt.Println("捕获组的开始")
  14. }
  15. }
  16. }

(也可以在这里找到)

EDIT2: 重写了你的代码:

  1. package main
  2. import (
  3. "fmt"
  4. "strings"
  5. )
  6. const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`
  7. func main() {
  8. extract(sample)
  9. }
  10. func extract(regex string) {
  11. start := 0
  12. for {
  13. i := strings.Index(regex[start:], "(?P<")
  14. if i < 0 {
  15. break
  16. }
  17. fmt.Printf("捕获组的开始位置 @ %d\n", start+i)
  18. start += i + 1
  19. }
  20. }

(也可以在这里找到)

英文:

If it's not for self-education or similar, you probably want to use the existing RE parser in the standard library and then "walk" the AST to do whatever required.

  1. func Parse(s string, flags Flags) (*Regexp, error)

> Parse parses a regular expression string s, controlled by the specified Flags,
> and returns a regular expression parse tree. The syntax is described in the
> top-level comment for package regexp.

There's even a helper for your task.

EDIT1: Your code repaired:

  1. package main
  2. import &quot;fmt&quot;
  3. const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`
  4. func main() {
  5. extract(sample)
  6. }
  7. func extract(regex string) {
  8. var last4 [4]int32
  9. for _, c := range regex {
  10. last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
  11. last4String := fmt.Sprintf(&quot;%c%c%c%c&quot;, last4[0], last4[1], last4[2], last4[3])
  12. if last4String == &quot;(?P&lt;&quot; {
  13. fmt.Println(&quot;start of capturing group&quot;)
  14. }
  15. }
  16. }

(Also here)

EDIT2: Your code rewritten:

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;strings&quot;
  5. )
  6. const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`
  7. func main() {
  8. extract(sample)
  9. }
  10. func extract(regex string) {
  11. start := 0
  12. for {
  13. i := strings.Index(regex[start:], &quot;(?P&lt;&quot;)
  14. if i &lt; 0 {
  15. break
  16. }
  17. fmt.Printf(&quot;start of capturing group @ %d\n&quot;, start+i)
  18. start += i + 1
  19. }
  20. }

(Also here)

huangapple
  • 本文由 发表于 2012年11月12日 05:19:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/13335556.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定