When using range to iterate over a string, the type differs between the value returned by range and accessing the rune directly using the index. Why?

huangapple go评论81阅读模式
英文:

When using range to iterate over a string, the type differs between the value returned by range and accessing the rune directly using the index. Why?

问题

当我迭代字符串以访问符文时,我有两个选项:

s := "AB"
for i, v := range s {
    // 通过值访问符文的副本
    fmt.Printf("%T", v) // 打印 int32
    // 直接通过索引访问符文
    fmt.Printf("%T", s[i]) // 打印 uint8
}

我理解 int32 是用于表示符文类型,而 uint8 是用于表示字节类型。这意味着有时我获取的是符文,而有时我获取的是字节。但为什么会这样呢?

背景信息:在这种情况下,这并不是一个问题,因为ASCII字符在uint8中有足够的空间,但是当出现表情符号等情况时,空间不足,因此值就会出错。

英文:

When I iterate over a string to access the runes, I have two options:

s := "AB"
range i, v := range s {
    // access the copy of the rune via value
    fmt.Printf("%T", v) // prints int32
    // access the rune via its index directly trough s
    fmt.Printf("%T", s[i]) // prints uint8
}

I do understand that int32 is for the rune type while uint8 for the byte type. Meaning that one time I get a rune and the other time I get a byte. But why?

For context: In this case it's not a problem because the ASCII chars have enough space inside uint8 but when there is an Emoji for example the space is not enough and therefore the value is wrong.

答案1

得分: 5

因为它们是不同的东西,具有不同的功能。在字符串上使用range迭代符文,通过索引访问字符串访问字节,因为字符串以UTF-8格式存储,并且不能以常数时间访问给定符文中的某个符文。明确一点:s[1]不是s的第二个符文、码点或字符;它是第二个字节。

如果你想迭代字节,可以使用range([]byte(s))。如果你想随机访问符文,可以使用[]rune(s)(最好将其转换一次,然后多次索引,否则可能会导致意外的二次方复杂度),或者找出如何使用strings包中的函数来实现你想要的功能。

英文:

Because they're different things that do different things. Range on a string iterates over runes, indexed access on a string accesses bytes, because strings are stored as UTF-8 and don't have constant-time access to a given rune in the middle. To be clear: s[1] is not the second rune, codepoint, or character of s; it's the second byte.

If you want to iterate over bytes you can use range([]byte(s)). If you want random access to runes you can use []rune(s) (better to convert once and index into it multiple times, otherwise you might end up on Accidentally Quadratic), or else figure out how to do what you want to do in terms of the functions in the strings package.

huangapple
  • 本文由 发表于 2023年6月14日 00:06:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76466800.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定