Does Golang do any conversion when casting a byte slice to a string?

huangapple go评论77阅读模式
英文:

Does Golang do any conversion when casting a byte slice to a string?

问题

Golang在将字节切片转换为字符串时是否进行任何转换或尝试解释字节?我刚刚尝试了一个包含空字节的字节切片,看起来它仍然保持字符串的原样。

var test []byte
test = append(test, 'a')
test = append(test, 'b')
test = append(test, 0)
test = append(test, 'd')
fmt.Println(test[2] == 0) // OK

但是对于包含无效Unicode点或UTF-8编码的字符串,转换会失败或数据会损坏吗?

英文:

Does Golang do any conversion or somehow try to interpret the bytes when casting a byte slice to a string? I've just tried with a byte slice containing a null byte and it looks like it still keep the string as it is.

var test []byte
test = append(test, 'a')
test = append(test, 'b')
test = append(test, 0)
test = append(test, 'd')
fmt.Println(test[2] == 0) // OK

But how about strings with invalid unicode points or UTF-8 encoding. Could the casting fail or the data be corrupted?

答案1

得分: 10

《Go编程语言规范》

字符串类型

字符串类型表示一组字符串值。字符串值是一个(可能为空的)字节序列。

转换

与字符串类型之间的转换

将字节切片转换为字符串类型会产生一个字符串,其连续的字节是切片的元素。

string([]byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'})   // "hellø"
string([]byte{})                                     // ""
string([]byte(nil))                                  // ""

type MyBytes []byte
string(MyBytes{'h', 'e', 'l', 'l', '\xc3', '\xb8'})  // "hellø"

将字符串类型的值转换为字节切片类型会产生一个切片,其连续的元素是字符串的字节。

[]byte("hellø")   // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}
[]byte("")        // []byte{}

MyBytes("hellø")  // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}

字符串值是一个(可能为空的)字节序列。字符串值可能表示以UTF-8编码的Unicode字符,也可能不表示。在从字节切片到字符串的转换过程中,字节不会被解释,从字符串到字节切片的转换也是如此。因此,字节不会被更改,转换也不会失败。

参考链接:
1: http://golang.org/ref/spec
2: http://golang.org/ref/spec#String_types
3: http://golang.org/ref/spec#Conversions

英文:

> The Go Programming Language Specification
>
> String types
>
> A string type represents the set of string values. A string value is a
> (possibly empty) sequence of bytes.
>
> Conversions
>
> Conversions to and from a string type
>
> Converting a slice of bytes to a string type yields a string whose
> successive bytes are the elements of the slice.
>
> string([]byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}) // "hellø"
> string([]byte{}) // ""
> string([]byte(nil)) // ""
>
> type MyBytes []byte
> string(MyBytes{'h', 'e', 'l', 'l', '\xc3', '\xb8'}) // "hellø"
>
> Converting a value of a string type to a slice of bytes type yields a
> slice whose successive elements are the bytes of the string.
>
> []byte("hellø") // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}
> []byte("") // []byte{}
>
> MyBytes("hellø") // []byte{'h', 'e', 'l', 'l', '\xc3', '\xb8'}

A string value is a (possibly empty) sequence of bytes. A string value may or may not represent Unicode characters encoded in UTF-8. There is no interpretation of the bytes during the conversion from byte slice to string nor from string to byte slice. Therefore, the bytes will not be changed and the conversions will not fail.

答案2

得分: 5

不,类型转换不会失败。以下是一个示例,展示了这一点(在Go Playground中运行):

b := []byte{0x80}
s := string(b)
fmt.Println(s)
fmt.Println([]byte(s))
for _, c := range s {
    fmt.Println(c)
}

这将打印出:

�
[128]
65533

请注意,根据Go规范,在无效的UTF-8序列上进行迭代是被明确定义的:

对于字符串值,"range"子句从字节索引0开始迭代字符串中的Unicode代码点。在后续的迭代中,索引值将是字符串中连续的UTF-8编码代码点的第一个字节的索引,第二个值(类型为rune)将是相应代码点的值。如果迭代遇到无效的UTF-8序列,则第二个值将为0xFFFD,即Unicode替换字符,并且下一次迭代将在字符串中前进一个字节。

英文:

No, the casting can't fail. Here's an example showing this (run in the Go Playground):

b := []byte{0x80}
s := string(b)
fmt.Println(s)
fmt.Println([]byte(s))
for _, c := range s {
	fmt.Println(c)
}

This prints:

�
[128]
65533

Note that ranging over invalid UTF-8 is well defined according to the Go spec:

> For a string value, the "range" clause iterates over the Unicode code
> points in the string starting at byte index 0. On successive
> iterations, the index value will be the index of the first byte of
> successive UTF-8-encoded code points in the string, and the second
> value, of type rune, will be the value of the corresponding code
> point. If the iteration encounters an invalid UTF-8 sequence, the
> second value will be 0xFFFD, the Unicode replacement character, and
> the next iteration will advance a single byte in the string.

huangapple
  • 本文由 发表于 2014年1月8日 09:46:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/20985536.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定