How can I iterate over a pair of strings rune by rune in Go?

huangapple go评论76阅读模式
英文:

How can I iterate over a pair of strings rune by rune in Go?

问题

在Go语言中,遍历两个字符串并逐个比较字符的惯用方法是什么?

根据我对Go的有限了解,一个简单的方法如下:

arunes := []rune(astr)
brunes := []rune(bstr)
for i, a := range arunes {
  b := brunes[i]
  // 使用a和b进行操作
}

astrbstr较短或者无论如何都需要完整扫描时,这种方法可以正常工作。但是,当它们很长并且有很高的可能性提前退出循环时,这种方法可能效率低下,因为据我了解,[]rune(..)需要对字符串进行完整扫描。特别是,如果字符串非常长,而我只需要查看两个字符串的前1%之类的内容,我希望避免扫描整个字符串。

(我最初的想法是使用某种形式的zip,但据我所知,Go中并不存在这样的函数,并且由于缺乏泛型,函数签名可能会变得非常复杂。但如果Go确实有一个干净好用的zip替代方案,我会非常高兴和兴奋地了解它)

简而言之,是否有一种在两个字符串中以字符对的方式进行迭代的惯用Go方法,同时在只需要查看两个字符串的一小部分时仍然高效?

英文:

What's the idiomatic Go way of iterating over two strings and comparing them rune by rune?

Given my limited understanding of Go, a simple way to do this would look like:

arunes := []rune(astr)
brunes := []rune(bstr)
for i, a := range arunes {
  b := brunes[i]
  // do something with a and b
}

Which works ok when astr and bstr are short or a full scan is required anyway, but when they are long and there is a high chance of breaking the loop early, this may be inefficient because from what I understand, []rune(..) requires a full scan of the string. In particular, if the strings are very long, and I only need to look at, for example, the first 1% of the two strings, I want to avoid scanning the entire string.

(my first thought was for some sort of zip, but afaik, this doesn't exist in Go, and due to the lack of generics, the function signature would have looked really gross anyway - but if Go did have a good clean alternative to zip, I would be most delighted and excited to learn about it)

TL;DR Is there an idiomatic Go way to iterate over runes in pairs of strings while still being efficient when only a small fraction of the two strings need to be looked at?

答案1

得分: 6

使用 utf8.DecodeRuneInString 函数从每个字符串中获取符文。

s1 := "hello world"
s2 := "Hello, 世界"
for {
    r1, n1 := utf8.DecodeRuneInString(s1)
    r2, n2 := utf8.DecodeRuneInString(s2)

    // 当字符串的末尾达到时,DecodeRuneInString 函数会返回一个大小为零的符文。
    // 在这里,当字符串的末尾达到时,我会跳出循环。
    // 根据你的应用程序,根据需要更新逻辑。
    if n1 == 0 || n2 == 0 {
        break
    }

    // 处理符文。
    fmt.Printf("%c %c\n", r1, r2)

    // 前进到下一个符文。
    s1 = s1[n1:]
    s2 = s2[n2:]
}

在 playground 上运行示例

英文:

Use utf8.DecodeRuneInString to get runes from each string.

s1 := "hello world"
s2 := "Hello, 世界"
for {
	r1, n1 := utf8.DecodeRuneInString(s1)
	r2, n2 := utf8.DecodeRuneInString(s2)

    // DeocdeRuneInString returns a zero size rune
    // at the end of the string. I break the loop
    // here when the end of a string is reached. Update
    // the logic as appropriate for your application.
	if n1 == 0 || n2 == 0 {
		break
	}

    // Process the runes.
	fmt.Printf("%c %c\n", r1, r2)

    // Advance to next rune.
	s1 = s1[n1:]
	s2 = s2[n2:]
}

Run the example on the playground.

huangapple
  • 本文由 发表于 2021年7月22日 00:07:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/68472941.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定