英文:
Type of string elements is uint8 using index and int32 on value
问题
这里我正在使用索引s[k]和值v检查字符串s的每个元素的类型,但返回不同的输出。使用索引i,我得到的类型是uint8,但对于值语义,我得到的是int32。
func main() {
s := "AaBbCcXxYyZz"
for k, v := range s {
fmt.Printf("%v\t%T\t%s\n", s[k], s[k], string(s[k]))
fmt.Printf("%v\t%T\t%s\n", v, v, string(v))
}
}
英文:
Here I am checking the type of each elements of string s using the index s[k] and value v but returning different outputs. Using index i am getting the type uint8 but for value semantics I am getting the int32.
func main() {
s := "AaBbCcXxYyZz"
for k,v := range s {
fmt.Printf("%v\t%T\t%s\n", s[k], s[k], string(s[k]))
fmt.Printf("%v\t%T\t%s\n", v, v, string(v))
}
}
答案1
得分: 1
循环for k,v := range s {}
遍历Unicode码点。在Golang中,它们被称为runes,并且以32位有符号整数表示:
对于字符串值,"range"子句从字节索引0开始迭代字符串中的Unicode码点。在后续的迭代中,索引值将是字符串中连续的UTF-8编码码点的第一个字节的索引,第二个值(类型为rune)将是相应码点的值。如果迭代遇到无效的UTF-8序列,则第二个值将为0xFFFD,即Unicode替换字符,并且下一次迭代将在字符串中前进一个字节。
索引s[k]
返回字符串的内部表示中的字节。
对于多字节的字母表(如中文),这种差异很容易看出。尝试迭代字符串"給祭断情試紀脱答条証行日稿"(它是一个无意义的中文lorem impsum短语):
s[0]: 231 uint8 ç
:32102 int32 給
s[3]: 231 uint8 ç
:31085 int32 祭
s[6]: 230 uint8 æ
:26029 int32 断
看到k
值之间的步长了吗?这是因为这些中文字符的utf-8
编码占据了3个字节。
完整示例:https://go.dev/play/p/-44NZMojcgq
英文:
The loop for k,v := range s {}
iterates over unicode codepoints. In Golang they are called runes and are represented as 32-bit signed inegers:
> For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.
The indexing s[k]
returns the byte in the internal representation of the string.
The difference is easy to see for multibyte alphabets, such as Chinese. Try iterate the string "給祭断情試紀脱答条証行日稿" (it a meaningless lorem impsum phrase in chinese):
s[0]: 231 uint8 ç
:32102 int32 給
s[3]: 231 uint8 ç
:31085 int32 祭
s[6]: 230 uint8 æ
:26029 int32 断
See the step between the values of k
? It is due to utf-8
encoding of those chinese characters occupies 3 bytes.
Full example: https://go.dev/play/p/-44NZMojcgq
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论