英文:
How do int-to-string casts work in Go?
问题
我今天才开始学习Go语言,所以这可能很明显,但我找不到相关的资料。
var x uint64 = 0x12345678; y := string(x)
这段代码会给变量 y
赋什么值?
我知道 var x uint8 = 65; y := string(x)
会使变量 y
被赋值为字节 65,即字符 A
。常识告诉我(因为允许将大于 uint8
的类型转换为字符串),它们应该被简单地按照本机字节顺序(即小端序)打包并赋值给变量。
但事实并非如此:
hex.EncodeToString([]byte(y)) ==> "efbfbd"
首先想到的是这是一个地址,最后一个字节被省略了,可能是因为某种奇怪的空终止符。但是,如果我分配两个不同值的 x
和 y
变量,并将它们打印出来,我得到的结果是相同的。
var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"
令人恼火的是,我找不到字符串类型的实现,尽管我可能还没有仔细寻找。
英文:
I only started Go today, so this may be obvious but I couldn't find anything on it.
What does var x uint64 = 0x12345678; y := string(x)
give y
?
I know var x uint8 = 65; y := string(x)
would give y
the byte 65, character A
, and common sense would suggest (since types larger than uint8
are allowed to be cast to strings) that they would simply be packed in to native byte order (i.e little endian) and assigned to the variable.
This does not seem to be the case:
hex.EncodeToString([]byte(y)) ==> "efbfbd"
First thought says this is an address with the last byte being left off because of some weird null terminator thingy, but if I allocate two x
and y
variables with two different values and print them out I get the same result.
var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"
Maddeningly I can't find the implementation for the string type anywhere although I probably haven't looked hard enough.
答案1
得分: 5
这在规范:转换:转换为字符串类型中有详细说明。
将有符号或无符号整数值转换为字符串类型会生成一个包含整数的UTF-8表示的字符串。超出有效Unicode代码点范围的值会被转换为"\uFFFD"。
因此,当你将数值转换为字符串时,只能得到一个包含一个符文(字符)的字符串。由于Go将字符串存储为内存中的UTF-8编码字节序列,如果将字符串转换为[]byte,你将看到这个编码:
将字符串类型的值转换为字节切片类型会生成一个切片,其中连续的元素是字符串的字节。
当你尝试将0x12345678、0x10000000和0x20000000的值转换为字符串时,由于它们超出了有效Unicode代码点的范围,根据规范,它们会被转换为"\uFFFD",在UTF-8编码中为[]byte{239, 191, 189};当编码为十六进制字符串时:
fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // 输出:efbfbd
或者简单地:
fmt.Printf("%x", "\uFFFD") // 输出:efbfbd
阅读博文Go中的字符串、字节、符文和字符以获取有关字符串内部的更多详细信息。
顺便说一下,自Go 1.5起,Go运行时(大部分)是用Go实现的,因此这些转换现在是用Go实现的,并且可以在runtime
包中找到:runtime/string.go
,查找intstring()
函数。
英文:
This is covered in the Spec: Conversions: Conversions to and from a string type:
> Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD"
.
So effectively when you convert a numeric value to string
, it can only yield a string
having one rune
(character). And since Go stores strings
as the UTF-8 encoded byte sequences in memory, that is what you will see if you convert your string
to []byte
:
> Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
When you try to conver the 0x12345678
, 0x10000000
and 0x20000000
values to string
, since they are outside of the range of valid Unicode code points, as per spec they are converted to "\uFFFD"
which in UTF-8 encoding is []byte{239, 191, 189}
; when encoded to hex string:
fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // Output: efbfbd
Or simply:
fmt.Printf("%x", "\uFFFD") // Output: efbfbd
Read the blog post Strings, bytes, runes and characters in Go for more details about string
internals.
And btw since Go 1.5 the Go runtime is implemented (mostly) in Go, so these conversions are now implemented in Go and can be found in the runtime
package: runtime/string.go
, look for the intstring()
function.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论