在Go语言中比较字符串

huangapple go评论78阅读模式
英文:

Comparing strings in Go

问题

我正在尝试找到字符串中命名捕获组的开始,以创建一个简单的解析器(参见相关问题)。为了做到这一点,extract函数会将last4变量中的最后四个字符记住。如果最后四个字符等于“(?P<”,那么它就是一个捕获组的开始:

package main

import "fmt"

const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`

func main() {
    extract(sample)
}

func extract(regex string) {
    last4 := new([4]int32)
    for _, c := range regex {
        last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
        last4String := fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])
        if last4String == "(?P&lt;" {
            fmt.Print("捕获组的开始")
        }
    }
}

http://play.golang.org/p/pqA-wCuvux

但是这段代码什么都没有打印!last4String == "(?P&lt;"永远不为真,尽管如果我在循环内打印last4String,这个子字符串会出现在输出中。那么如何在Go中比较字符串呢?

还有没有比fmt.Sprintf("%c%c%c%c\n", last4[0], last4[1], last4[2], last4[3])更优雅的将int32数组转换为字符串的方法?

还有其他可以改进的地方吗?我的代码在我看来有些不够优雅。

英文:

I'm trying to find the begin of a named capturing groups in a string to create a simple parser (see related question). To do this the extract function remembers the last for characters in the last4 variable. If the last 4 characters are equal to "(?P<" it is the beginning of a capturing group:

package main

import &quot;fmt&quot;

const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`

func main() {
	extract(sample)
}

func extract(regex string) {
	last4 := new([4]int32)
	for _, c := range regex {
		last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
		last4String := fmt.Sprintf(&quot;%c%c%c%c\n&quot;, last4[0], last4[1], last4[2], last4[3])
		if last4String == &quot;(?P&lt;&quot; {
			fmt.Print(&quot;start of capturing group&quot;)
		}
	}
}

http://play.golang.org/p/pqA-wCuvux

But this code prints nothing! last4String == &quot;(?P&lt;&quot; is never true, although this substrin appears in the output if I print last4String inside the loop. How to compare strings in Go then?

And is there a more elegant way to convert an int32 array to a string than fmt.Sprintf(&quot;%c%c%c%c\n&quot;, last4[0], last4[1], last4[2], last4[3])?

Anything else that could be better? My code looks somewhat inelegant to me.

答案1

得分: 3

如果不是为了自我教育或类似的目的,你可能想要使用标准库中现有的RE解析器,然后“遍历”AST来执行所需的操作。

func Parse(s string, flags Flags) (*Regexp, error)

Parse解析一个正则表达式字符串s,由指定的Flags控制,并返回一个正则表达式解析树。语法在包regexp的顶级注释中有描述。

甚至还有一个辅助函数可以完成你的任务。

EDIT1: 修复了你的代码:

package main

import "fmt"

const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        var last4 [4]int32
        for _, c := range regex {
                last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
                last4String := fmt.Sprintf("%c%c%c%c", last4[0], last4[1], last4[2], last4[3])
                if last4String == "(?P<" {
                    fmt.Println("捕获组的开始")
                }
        }
}

(也可以在这里找到)

EDIT2: 重写了你的代码:

package main

import (
        "fmt"
        "strings"
)

const sample string = `/(?P<country>m((a|b).+)(x|y)n)/(?P<city>.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        start := 0
        for {
                i := strings.Index(regex[start:], "(?P<")
                if i < 0 {
                        break
                }

                fmt.Printf("捕获组的开始位置 @ %d\n", start+i)
                start += i + 1
        }
}

(也可以在这里找到)

英文:

If it's not for self-education or similar, you probably want to use the existing RE parser in the standard library and then "walk" the AST to do whatever required.

func Parse(s string, flags Flags) (*Regexp, error)

> Parse parses a regular expression string s, controlled by the specified Flags,
> and returns a regular expression parse tree. The syntax is described in the
> top-level comment for package regexp.

There's even a helper for your task.

EDIT1: Your code repaired:

package main

import &quot;fmt&quot;

const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        var last4 [4]int32
        for _, c := range regex {
                last4[0], last4[1], last4[2], last4[3] = last4[1], last4[2], last4[3], c
                last4String := fmt.Sprintf(&quot;%c%c%c%c&quot;, last4[0], last4[1], last4[2], last4[3])
                if last4String == &quot;(?P&lt;&quot; {
                    fmt.Println(&quot;start of capturing group&quot;)
                }
        }
}

(Also here)

EDIT2: Your code rewritten:

package main

import (
        &quot;fmt&quot;
        &quot;strings&quot;
)

const sample string = `/(?P&lt;country&gt;m((a|b).+)(x|y)n)/(?P&lt;city&gt;.+)`

func main() {
        extract(sample)
}

func extract(regex string) {
        start := 0
        for {
                i := strings.Index(regex[start:], &quot;(?P&lt;&quot;)
                if i &lt; 0 {
                        break
                }

                fmt.Printf(&quot;start of capturing group @ %d\n&quot;, start+i)
                start += i + 1
        }
}

(Also here)

huangapple
  • 本文由 发表于 2012年11月12日 05:19:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/13335556.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定