字符串元素的类型是uint8,使用索引和int32对值进行操作。

huangapple go评论97阅读模式
英文:

Type of string elements is uint8 using index and int32 on value

问题

这里我正在使用索引s[k]和值v检查字符串s的每个元素的类型,但返回不同的输出。使用索引i,我得到的类型是uint8,但对于值语义,我得到的是int32。

func main() {
    s := "AaBbCcXxYyZz"
    for k, v := range s {
        fmt.Printf("%v\t%T\t%s\n", s[k], s[k], string(s[k]))
        fmt.Printf("%v\t%T\t%s\n", v, v, string(v))
    }
}
英文:

Here I am checking the type of each elements of string s using the index s[k] and value v but returning different outputs. Using index i am getting the type uint8 but for value semantics I am getting the int32.

func main() {
    s := "AaBbCcXxYyZz"
	for k,v := range s {
        fmt.Printf("%v\t%T\t%s\n", s[k], s[k], string(s[k]))
		fmt.Printf("%v\t%T\t%s\n", v, v, string(v))
	} 
}

答案1

得分: 1

循环for k,v := range s {}遍历Unicode码点。在Golang中,它们被称为runes,并且以32位有符号整数表示:

对于字符串值,"range"子句从字节索引0开始迭代字符串中的Unicode码点。在后续的迭代中,索引值将是字符串中连续的UTF-8编码码点的第一个字节的索引,第二个值(类型为rune)将是相应码点的值。如果迭代遇到无效的UTF-8序列,则第二个值将为0xFFFD,即Unicode替换字符,并且下一次迭代将在字符串中前进一个字节。

Golang规范

索引s[k]返回字符串的内部表示中的字节。

对于多字节的字母表(如中文),这种差异很容易看出。尝试迭代字符串"給祭断情試紀脱答条証行日稿"(它是一个无意义的中文lorem impsum短语):

s[0]: 231	uint8	ç
     :32102	int32	給
s[3]: 231	uint8	ç
     :31085	int32	祭
s[6]: 230	uint8	æ
     :26029	int32	断

看到k值之间的步长了吗?这是因为这些中文字符的utf-8编码占据了3个字节。
完整示例:https://go.dev/play/p/-44NZMojcgq

英文:

The loop for k,v := range s {} iterates over unicode codepoints. In Golang they are called runes and are represented as 32-bit signed inegers:
> For a string value, the "range" clause iterates over the Unicode code points in the string starting at byte index 0. On successive iterations, the index value will be the index of the first byte of successive UTF-8-encoded code points in the string, and the second value, of type rune, will be the value of the corresponding code point. If the iteration encounters an invalid UTF-8 sequence, the second value will be 0xFFFD, the Unicode replacement character, and the next iteration will advance a single byte in the string.

Golang specification

The indexing s[k] returns the byte in the internal representation of the string.

The difference is easy to see for multibyte alphabets, such as Chinese. Try iterate the string "給祭断情試紀脱答条証行日稿" (it a meaningless lorem impsum phrase in chinese):

s[0]: 231	uint8	ç
     :32102	int32	給
s[3]: 231	uint8	ç
     :31085	int32	祭
s[6]: 230	uint8	æ
     :26029	int32	断

See the step between the values of k? It is due to utf-8 encoding of those chinese characters occupies 3 bytes.
Full example: https://go.dev/play/p/-44NZMojcgq

huangapple
  • 本文由 发表于 2022年11月6日 14:35:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/74333752.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定