为什么在解组 JSON 时,字符串和 []bytes 被处理方式不同?

huangapple go评论74阅读模式
英文:

Why are string and []bytes treated differently when unmarshaling JSON?

问题

从阅读文档的理解来看,string本质上是一个不可变的[]byte,可以在两者之间轻松转换。

然而,当从JSON解组时,这似乎并不正确。请看下面的示例程序:

package main

import (
	"encoding/json"
	"fmt"
)

type STHRaw struct {
	Hash []byte `json:"hash"`
}

type STHString struct {
	Hash string `json:"hash"`
}

func main() {
	bytes := []byte(`{"hash": "nuyHN9wx4lZL2L3Ir3dhZpmggTQEIHEZcC3DUNCtQsk="}`)

	stringHead := new(STHString)
	if err := json.Unmarshal(bytes, &stringHead); err != nil {
		return
	}

	rawHead := new(STHRaw)
	if err := json.Unmarshal(bytes, &rawHead); err != nil {
		return
	}

	fmt.Printf("String:\t\t%x\n", stringHead.Hash)
	fmt.Printf("Raw:\t\t%x\n", rawHead.Hash)
	fmt.Printf("Raw to string:\t%x\n", string(rawHead.Hash[:]))
}

这将产生以下输出:

String:			6e7579484e397778346c5a4c324c3349723364685a706d67675451454948455a63433344554e437451736b3d
Raw:			9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9
Raw to string:	9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9

相反,我原本期望每次都收到相同的值。

这两者有什么区别?

英文:

My understanding from reading the documentation was that string is essentially an immutable []byte and that one can easily convert between the two.

However when unmarshaling from JSON this doesn't seem to be true. Take the following example program:

package main

import (
    "encoding/json"
	"fmt"
)

type STHRaw struct {
	Hash []byte `json:"hash"`
}

type STHString struct {
	Hash string `json:"hash"`
}

func main() {
	bytes := []byte(`{"hash": "nuyHN9wx4lZL2L3Ir3dhZpmggTQEIHEZcC3DUNCtQsk="}`)

	stringHead := new(STHString)
	if err := json.Unmarshal(bytes, &stringHead); err != nil {
		return
	}

	rawHead := new(STHRaw)
	if err := json.Unmarshal(bytes, &rawHead); err != nil {
		return
	}

	fmt.Printf("String:\t\t%x\n", stringHead.Hash)
	fmt.Printf("Raw:\t\t%x\n", rawHead.Hash)
	fmt.Printf("Raw to string:\t%x\n", string(rawHead.Hash[:]))
}

This gives the following output:

String:			6e7579484e397778346c5a4c324c3349723364685a706d67675451454948455a63433344554e437451736b3d
Raw:			9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9
Raw to string:	9eec8737dc31e2564bd8bdc8af77616699a0813404207119702dc350d0ad42c9 

Instead I would have expected to receive the same value each time.

What is the difference?

答案1

得分: 4

encoding/json包的设计者决定应用程序必须在string值中提供有效的UTF-8文本,并且应用程序可以在[]byte值中放置任意字节序列。该包将[]byte值进行base64编码,以确保生成的字符串是有效的UTF-8。

[]byte值的编码在Marshal函数文档中有描述。

这个决定并不是由Go语言的设计所决定的。string类型可以包含任意字节序列。[]byte类型可以包含有效的UTF-8文本。

设计者本可以在字段标签中使用一个标志来指示应该对string[]byte值进行编码,并指定使用哪个编码器,但他们没有这样做。

英文:

The designers of the encoding/json package made the decision that applications must provide valid UTF-8 text in string values and that applications can put arbitrary byte sequences in []byte values. The package base64 encodes []byte values to ensure that the resulting string is valid UTF-8.

The encoding of []byte values is described in the Marshal function documentation.

This decision was not dictated by the design of the Go language. The string type can contain arbitrary byte sequences. The []byte type can contain valid UTF-8 text.

The designers could have used a flag in the field tag to indicate that a string or []byte value should be encoded and which encoder to use, but that's not what they did.

huangapple
  • 本文由 发表于 2015年1月31日 10:51:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/28247649.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定