如何在Go中循环遍历UTF-8字符串?

huangapple go评论77阅读模式
英文:

How to loop through a UTF-8 string in Go?

问题

我有一个中文字符串:

x = "你好"

我想循环遍历它,并对其中的每个字符做一些操作,类似于:

for i, len := 0, len(x); i < len; i++ {
    foo( x[i] ) // 做某事
}

我发现len(x)返回的是6而不是2,在谷歌上找到了RuneCountInString方法,它可以返回字符串的实际长度,但我仍然不知道如何循环使得x[i]获取正确的字符,例如x[0] == '你'

谢谢

英文:

I have a string in Chinese:

x = &quot;你好&quot;

I'd like to loop through it and do something with each character in it, something like:

for i, len := 0, len(x); i &lt; len; i++ {
    foo( x[i] ) // do sth.
}

I found that len(x) would return 6 instead of 2, after Google I found the method RuneCountInString which would return the real length of the string, but I still don't know how to loop to make x[i] get the right character, x[0] == &#39;你&#39; for example..

Thanks

答案1

得分: 33

使用range

x = "你好"
for _, c := range x {
    // 对c进行一些操作
}

如果你想要随机访问,你需要使用代码单元索引而不是字符索引。幸运的是,没有什么好的理由需要字符索引,所以代码单元索引是可以的。

大多数语言都有同样的问题。例如,Java和C#使用UTF-16,这也是一种可变长度编码(但有些人假装它不是)。

有关为什么Go使用UTF-8的更多信息,请参见UTF-8宣言

英文:

Use range.

x = &quot;你好&quot;
for _, c := range x {
    // do something with c
}

If you want random-access, you'll need to use code unit indexes rather than character indexes. Fortunately, there is no good reason to need character indexes, so code unit indexes are fine.

Most languages have the exact same problem. For example, Java and C# use UTF-16, which is also a variable-length encoding (but some people pretend it isn't).

See the UTF-8 Manifesto for more information about why Go uses UTF-8.

huangapple
  • 本文由 发表于 2012年10月5日 13:44:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/12740180.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定