英文:
Skipping ahead n codepoints while iterating through a unicode string in Go
问题
在Go语言中,使用以下方式迭代字符串:
for i := 0; i < len(myString); i++ {
doSomething(myString[i])
}
这种方式只能访问字符串中的单个字节。而使用以下方式迭代字符串:
for i, c := range myString {
doSomething(c)
}
可以迭代字符串中的单个Unicode码点(在Go中称为rune
),一个码点可能由多个字节组成。
我的问题是:在使用range myString
迭代字符串时,如何跳过一些字符?continue
可以跳过一个Unicode码点,但如果要跳过三个码点,就不能简单地使用i += 3
。那么,最常用的方法是如何向前跳过n个码点?
我在golang nuts邮件列表上提出了这个问题,并得到了回答,感谢该列表上的一些乐于助人的人。然而,有人给我发消息建议我在Stack Overflow上创建一个自问自答的问题,以便为遇到相同问题的人节省麻烦。这就是这个问题的目的。
英文:
In Go, iterating over a string using
for i := 0; i < len(myString); i++{
doSomething(myString[i])
}
only accesses individual bytes in the string, whereas iterating over a string via
for i, c := range myString{
doSomething(c)
}
iterates over individual Unicode codepoints (calledrune
s in Go), which may span multiple bytes.
My question is: how does one go about jumping ahead while iterating over a string with range Mystring
? continue
can jump ahead by one unicode codepoint, but it's not possible to just do i += 3
for instance if you want to jump ahead three codepoints. So what would be the most idiomatic way to advance forward by n codepoints?
I asked this question on the golang nuts mailing list, and it was answered, courtesy of some of the helpful folks on the list. Someone messaged me however suggesting I create a self-answered question on Stack Overflow for this, to save the next person with the same issue some trouble. That's what this is.
答案1
得分: 6
我会尽量避免转换为[]rune
,直接编写代码。
skip := 0
for _, c := range myString {
if skip > 0 {
skip--
continue
}
skip = doSomething(c)
}
这种逐个跳过符文的方式看起来效率低下,但与转换为[]rune
的工作量相同。这段代码的优点是避免了分配符文切片的内存,该切片的大小大约是原始字符串的4倍(取决于较大代码点的数量)。当然,转换为[]rune
会更简单一些,所以你可能更喜欢那种方式。
英文:
I'd consider avoiding the conversion to []rune
, and code this directly.
skip := 0
for _, c := range myString {
if skip > 0 {
skip--
continue
}
skip = doSomething(c)
}
It looks inefficient to skip runes one by one like this, but it's the same amount of work as the conversion to []rune
would be. The advantage of this code is that it avoids allocating the rune slice, which will be approximately 4 times larger than the original string (depending on the number of larger code points you have). Of course converting to []rune
is a bit simpler so you may prefer that.
答案2
得分: 2
这可以通过将字符串转换为符文切片来轻松实现。
runes := []rune(myString)
for i := 0; i < len(runes); i++{
jumpHowFarAhead := doSomething(runes[i])
i += jumpHowFarAhead
}
英文:
It turns out this can be done quite easily simply by casting the string into a slice of runes.
runes := []rune(myString)
for i := 0; i < len(runes); i++{
jumpHowFarAhead := doSomething(runes[i])
i += jumpHowFarAhead
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论