Skipping ahead n codepoints while iterating through a unicode string in Go

huangapple go评论127阅读模式

Skipping ahead n codepoints while iterating through a unicode string in Go



  1. for i := 0; i < len(myString); i++ {
  2. doSomething(myString[i])
  3. }


  1. for i, c := range myString {
  2. doSomething(c)
  3. }


我的问题是:在使用range myString迭代字符串时,如何跳过一些字符?continue可以跳过一个Unicode码点,但如果要跳过三个码点,就不能简单地使用i += 3。那么,最常用的方法是如何向前跳过n个码点?

我在golang nuts邮件列表上提出了这个问题,并得到了回答,感谢该列表上的一些乐于助人的人。然而,有人给我发消息建议我在Stack Overflow上创建一个自问自答的问题,以便为遇到相同问题的人节省麻烦。这就是这个问题的目的。


In Go, iterating over a string using

  1. for i := 0; i &lt; len(myString); i++{
  2. doSomething(myString[i])
  3. }

only accesses individual bytes in the string, whereas iterating over a string via

  1. for i, c := range myString{
  2. doSomething(c)
  3. }

iterates over individual Unicode codepoints (calledrunes in Go), which may span multiple bytes.

My question is: how does one go about jumping ahead while iterating over a string with range Mystring? continue can jump ahead by one unicode codepoint, but it's not possible to just do i += 3 for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?

I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.


得分: 6


  1. skip := 0
  2. for _, c := range myString {
  3. if skip > 0 {
  4. skip--
  5. continue
  6. }
  7. skip = doSomething(c)
  8. }



I'd consider avoiding the conversion to []rune, and code this directly.

  1. skip := 0
  2. for _, c := range myString {
  3. if skip &gt; 0 {
  4. skip--
  5. continue
  6. }
  7. skip = doSomething(c)
  8. }

It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune is a bit simpler so you may prefer that.


得分: 2


  1. runes := []rune(myString)
  2. for i := 0; i < len(runes); i++{
  3. jumpHowFarAhead := doSomething(runes[i])
  4. i += jumpHowFarAhead
  5. }

It turns out this can be done quite easily simply by casting the string into a slice of runes.

  1. runes := []rune(myString)
  2. for i := 0; i &lt; len(runes); i++{
  3. jumpHowFarAhead := doSomething(runes[i])
  4. i += jumpHowFarAhead
  5. }

  • 本文由 发表于 2014年4月20日 16:00:43
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
