How do int-to-string casts work in Go?

huangapple go评论85阅读模式
英文:

How do int-to-string casts work in Go?

问题

我今天才开始学习Go语言,所以这可能很明显,但我找不到相关的资料。

var x uint64 = 0x12345678; y := string(x) 这段代码会给变量 y 赋什么值?

我知道 var x uint8 = 65; y := string(x) 会使变量 y 被赋值为字节 65,即字符 A。常识告诉我(因为允许将大于 uint8 的类型转换为字符串),它们应该被简单地按照本机字节顺序(即小端序)打包并赋值给变量。

但事实并非如此:

hex.EncodeToString([]byte(y)) ==> "efbfbd"

首先想到的是这是一个地址,最后一个字节被省略了,可能是因为某种奇怪的空终止符。但是,如果我分配两个不同值的 xy 变量,并将它们打印出来,我得到的结果是相同的。

var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"

令人恼火的是,我找不到字符串类型的实现,尽管我可能还没有仔细寻找。

英文:

I only started Go today, so this may be obvious but I couldn't find anything on it.

What does var x uint64 = 0x12345678; y := string(x) give y?

I know var x uint8 = 65; y := string(x) would give y the byte 65, character A, and common sense would suggest (since types larger than uint8 are allowed to be cast to strings) that they would simply be packed in to native byte order (i.e little endian) and assigned to the variable.

This does not seem to be the case:

hex.EncodeToString([]byte(y)) ==> "efbfbd"

First thought says this is an address with the last byte being left off because of some weird null terminator thingy, but if I allocate two x and y variables with two different values and print them out I get the same result.

var x, x2 uint64 = 0x10000000, 0x20000000
y, y2 := string(x), string(x2)
fmt.Println(hex.EncodeToString([]byte(y))) // "efbfbd"
fmt.Println(hex.EncodeToString([]byte(y2))) // "efbfbd"

Maddeningly I can't find the implementation for the string type anywhere although I probably haven't looked hard enough.

答案1

得分: 5

这在规范:转换:转换为字符串类型中有详细说明。

将有符号或无符号整数值转换为字符串类型会生成一个包含整数的UTF-8表示的字符串。超出有效Unicode代码点范围的值会被转换为"\uFFFD"。

因此,当你将数值转换为字符串时,只能得到一个包含一个符文(字符)的字符串。由于Go将字符串存储为内存中的UTF-8编码字节序列,如果将字符串转换为[]byte,你将看到这个编码:

将字符串类型的值转换为字节切片类型会生成一个切片,其中连续的元素是字符串的字节。

当你尝试将0x12345678、0x10000000和0x20000000的值转换为字符串时,由于它们超出了有效Unicode代码点的范围,根据规范,它们会被转换为"\uFFFD",在UTF-8编码中为[]byte{239, 191, 189};当编码为十六进制字符串时:

fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // 输出:efbfbd

或者简单地:

fmt.Printf("%x", "\uFFFD") // 输出:efbfbd

阅读博文Go中的字符串、字节、符文和字符以获取有关字符串内部的更多详细信息。

顺便说一下,自Go 1.5起,Go运行时(大部分)是用Go实现的,因此这些转换现在是用Go实现的,并且可以在runtime包中找到:runtime/string.go,查找intstring()函数。

英文:

This is covered in the Spec: Conversions: Conversions to and from a string type:

> Converting a signed or unsigned integer value to a string type yields a string containing the UTF-8 representation of the integer. Values outside the range of valid Unicode code points are converted to "\uFFFD".

So effectively when you convert a numeric value to string, it can only yield a string having one rune (character). And since Go stores strings as the UTF-8 encoded byte sequences in memory, that is what you will see if you convert your string to []byte:

> Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.

When you try to conver the 0x12345678, 0x10000000 and 0x20000000 values to string, since they are outside of the range of valid Unicode code points, as per spec they are converted to "\uFFFD" which in UTF-8 encoding is []byte{239, 191, 189}; when encoded to hex string:

fmt.Println(hex.EncodeToString([]byte("\uFFFD"))) // Output: efbfbd

Or simply:

fmt.Printf("%x", "\uFFFD") // Output: efbfbd

Read the blog post Strings, bytes, runes and characters in Go for more details about string internals.

And btw since Go 1.5 the Go runtime is implemented (mostly) in Go, so these conversions are now implemented in Go and can be found in the runtime package: runtime/string.go, look for the intstring() function.

huangapple
  • 本文由 发表于 2016年1月15日 17:56:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/34808465.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定