Skipping ahead n codepoints while iterating through a unicode string in Go

huangapple go评论85阅读模式
英文:

Skipping ahead n codepoints while iterating through a unicode string in Go

问题

在Go语言中,使用以下方式迭代字符串:

for i := 0; i < len(myString); i++ {
    doSomething(myString[i])
}

这种方式只能访问字符串中的单个字节。而使用以下方式迭代字符串:

for i, c := range myString {
    doSomething(c)
}

可以迭代字符串中的单个Unicode码点(在Go中称为rune),一个码点可能由多个字节组成。

我的问题是:在使用range myString迭代字符串时,如何跳过一些字符?continue可以跳过一个Unicode码点,但如果要跳过三个码点,就不能简单地使用i += 3。那么,最常用的方法是如何向前跳过n个码点?

我在golang nuts邮件列表上提出了这个问题,并得到了回答,感谢该列表上的一些乐于助人的人。然而,有人给我发消息建议我在Stack Overflow上创建一个自问自答的问题,以便为遇到相同问题的人节省麻烦。这就是这个问题的目的。

英文:

In Go, iterating over a string using

for i := 0; i &lt; len(myString); i++{ 
    doSomething(myString[i])
}

only accesses individual bytes in the string, whereas iterating over a string via

for i, c := range myString{ 
    doSomething(c)
}

iterates over individual Unicode codepoints (calledrunes in Go), which may span multiple bytes.

My question is: how does one go about jumping ahead while iterating over a string with range Mystring? continue can jump ahead by one unicode codepoint, but it's not possible to just do i += 3 for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?

I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.

答案1

得分: 6

我会尽量避免转换为[]rune,直接编写代码。

skip := 0
for _, c := range myString {
    if skip > 0 {
        skip--
        continue
    }
    skip = doSomething(c)
}

这种逐个跳过符文的方式看起来效率低下,但与转换为[]rune的工作量相同。这段代码的优点是避免了分配符文切片的内存,该切片的大小大约是原始字符串的4倍(取决于较大代码点的数量)。当然,转换为[]rune会更简单一些,所以你可能更喜欢那种方式。

英文:

I'd consider avoiding the conversion to []rune, and code this directly.

skip := 0
for _, c := range myString {
    if skip &gt; 0 {
        skip--
        continue
    }
    skip = doSomething(c)
}

It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune is a bit simpler so you may prefer that.

答案2

得分: 2

这可以通过将字符串转换为符文切片来轻松实现。

runes := []rune(myString)
for i := 0; i < len(runes); i++{
    jumpHowFarAhead := doSomething(runes[i])
    i += jumpHowFarAhead
}
英文:

It turns out this can be done quite easily simply by casting the string into a slice of runes.

runes := []rune(myString)
for i := 0; i &lt; len(runes); i++{
    jumpHowFarAhead := doSomething(runes[i])
    i += jumpHowFarAhead
}

huangapple
  • 本文由 发表于 2014年4月20日 16:00:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/23179824.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定