英文:
How to loop through a UTF-8 string in Go?
问题
我有一个中文字符串:
x = "你好"
我想循环遍历它,并对其中的每个字符做一些操作,类似于:
for i, len := 0, len(x); i < len; i++ {
foo( x[i] ) // 做某事
}
我发现len(x)
返回的是6
而不是2
,在谷歌上找到了RuneCountInString
方法,它可以返回字符串的实际长度,但我仍然不知道如何循环使得x[i]
获取正确的字符,例如x[0] == '你'
。
谢谢
英文:
I have a string in Chinese:
x = "你好"
I'd like to loop through it and do something with each character in it, something like:
for i, len := 0, len(x); i < len; i++ {
foo( x[i] ) // do sth.
}
I found that len(x)
would return 6
instead of 2
, after Google I found the method RuneCountInString
which would return the real length of the string, but I still don't know how to loop to make x[i]
get the right character, x[0] == '你'
for example..
Thanks
答案1
得分: 33
使用range
。
x = "你好"
for _, c := range x {
// 对c进行一些操作
}
如果你想要随机访问,你需要使用代码单元索引而不是字符索引。幸运的是,没有什么好的理由需要字符索引,所以代码单元索引是可以的。
大多数语言都有同样的问题。例如,Java和C#使用UTF-16,这也是一种可变长度编码(但有些人假装它不是)。
有关为什么Go使用UTF-8的更多信息,请参见UTF-8宣言。
英文:
Use range
.
x = "你好"
for _, c := range x {
// do something with c
}
If you want random-access, you'll need to use code unit indexes rather than character indexes. Fortunately, there is no good reason to need character indexes, so code unit indexes are fine.
Most languages have the exact same problem. For example, Java and C# use UTF-16, which is also a variable-length encoding (but some people pretend it isn't).
See the UTF-8 Manifesto for more information about why Go uses UTF-8.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论