Go json.Unmarshal key with \u0000 \x00

huangapple go评论91阅读模式
英文:

Go json.Unmarshal key with \u0000 \x00

问题

这是 Go Playground 的链接。

基本上,在我的 JSON 字符串键中有一些特殊字符('\u0000'):

var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)

我想将其解组为一个结构体:

type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"
type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"\0*\0_errorMessages"`
    Msg4   interface{} `json:"\\0*\\0_errorMessages"`
    Msg5   interface{} `json:"\x00*\x00_errorMessages"`
    Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
    SMsg   interface{} `json:"*_successMessages"`
}
*
type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"\0*\0_errorMessages"`
    Msg4   interface{} `json:"\\0*\\0_errorMessages"`
    Msg5   interface{} `json:"\x00*\x00_errorMessages"`
    Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
    SMsg   interface{} `json:"*_successMessages"`
}
_errorMessages"`
Msg4 interface{} `json:"\
type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"\0*\0_errorMessages"`
    Msg4   interface{} `json:"\\0*\\0_errorMessages"`
    Msg5   interface{} `json:"\x00*\x00_errorMessages"`
    Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
    SMsg   interface{} `json:"*_successMessages"`
}
*\
type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
    Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"\0*\0_errorMessages"`
    Msg4   interface{} `json:"\\0*\\0_errorMessages"`
    Msg5   interface{} `json:"\x00*\x00_errorMessages"`
    Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
    SMsg   interface{} `json:"*_successMessages"`
}
_errorMessages"`
Msg5 interface{} `json:"\x00*\x00_errorMessages"` Msg6 interface{} `json:"\\x00*\\x00_errorMessages"` SMsg interface{} `json:"*_successMessages"` }

我尝试了很多次,但它不起作用。这个链接可能会有所帮助:golang.org/src/encoding/json/encode_test.go

英文:

Here is the Go playground link.

Basically there are some special characters ('\u0000') in my JSON string key:

var j = []byte(`{"Page":1,"Fruits":["5","6"],"\u0000*\u0000_errorMessages":{"x":"123"},"*_successMessages":{"ok":"hi"}}`)

I want to Unmarshal it into a struct:

type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"*_errorMessages"`
    Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
	Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
    Msg3   interface{} `json:"
type Response1 struct {
Page   int
Fruits []string
Msg    interface{} `json:"*_errorMessages"`
Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3   interface{} `json:"\0*\0_errorMessages"`
Msg4   interface{} `json:"\\0*\\0_errorMessages"`
Msg5   interface{} `json:"\x00*\x00_errorMessages"`
Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg   interface{} `json:"*_successMessages"`
}
*
type Response1 struct {
Page   int
Fruits []string
Msg    interface{} `json:"*_errorMessages"`
Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3   interface{} `json:"\0*\0_errorMessages"`
Msg4   interface{} `json:"\\0*\\0_errorMessages"`
Msg5   interface{} `json:"\x00*\x00_errorMessages"`
Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg   interface{} `json:"*_successMessages"`
}
_errorMessages"` Msg4 interface{} `json:"\
type Response1 struct {
Page   int
Fruits []string
Msg    interface{} `json:"*_errorMessages"`
Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3   interface{} `json:"\0*\0_errorMessages"`
Msg4   interface{} `json:"\\0*\\0_errorMessages"`
Msg5   interface{} `json:"\x00*\x00_errorMessages"`
Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg   interface{} `json:"*_successMessages"`
}
*\
type Response1 struct {
Page   int
Fruits []string
Msg    interface{} `json:"*_errorMessages"`
Msg1   interface{} `json:"\\u0000*\\u0000_errorMessages"`
Msg2   interface{} `json:"\u0000*\u0000_errorMessages"`
Msg3   interface{} `json:"\0*\0_errorMessages"`
Msg4   interface{} `json:"\\0*\\0_errorMessages"`
Msg5   interface{} `json:"\x00*\x00_errorMessages"`
Msg6   interface{} `json:"\\x00*\\x00_errorMessages"`
SMsg   interface{} `json:"*_successMessages"`
}
_errorMessages"` Msg5 interface{} `json:"\x00*\x00_errorMessages"` Msg6 interface{} `json:"\\x00*\\x00_errorMessages"` SMsg interface{} `json:"*_successMessages"` }

I tried a lot but it's not working.
This link might help golang.org/src/encoding/json/encode_test.go.

答案1

得分: 7

简短回答: 使用当前的json实现,仅使用结构标签是不可能的。

注意: 这是一种_实现_限制,而不是_规范_限制(这是json包实现的限制,而不是结构标签规范的限制)。


一些背景知识:你使用原始字符串字面量指定了标签:

原始字符串字面量的值是由引号之间的未解释字符(隐式UTF-8编码)组成的字符串...

因此,编译器不会对原始字符串字面量的内容进行转义或去引号操作。

结构标签值的约定来自于reflect.StructTag

按照约定,标签字符串是一个可选的以空格分隔的key:"value"对的串联。每个key是一个非空字符串,由非控制字符组成,除了空格(U+0020 ' ')、引号(U+0022 '"')和冒号(U+003A ':')。每个value使用U+0022 '"'字符引起来,并使用Go字符串字面量语法进行引用。

这意味着按照约定,标签值是由空格分隔的(key:"value")对列表组成。对于key有一些限制,但是value可以是任何内容,并且value(应该)使用"Go字符串字面量语法",这意味着这些值将在运行时从代码中去引号(通过调用strconv.Unquote(),在源文件reflect/type.go中的StructTag.Get()中调用,当前位于第809行)。

因此,不需要双引号。看看你的简化示例:

type Response1 struct {
    Page   int
    Fruits []string
    Msg    interface{} `json:"\u0000_abc"`
}

现在,以下代码:

t := reflect.TypeOf(Response1{})
fmt.Printf("%#v\n", t.Field(2).Tag)
fmt.Printf("%#v\n", t.Field(2).Tag.Get("json"))

输出:

"json:"\u0000_abc""
"\x00_abc"

可以看到,json键的值部分是"\x00_abc",因此它正确地包含了零字符。

但是json包将如何使用它?

json包使用StructTag.Get()(来自reflect包)返回的值,就像我们所做的一样。你可以在json/encode.go源文件中的typeFields()函数中看到它,当前位于第1032行。到目前为止一切顺利。

然后它调用未导出的json.parseTag()函数,在json/tags.go源文件中,当前位于第17行。这会截取逗号后面的部分(成为"标签选项")。

最后,在源文件json/encode.go中的json.isValidTag()函数中使用之前的值,当前位于第731行。该函数检查传递的string的符文,并且(除了一组预定义的允许字符"!#$%&()*+-./:<=>?@[]^_{|}~ "之外)拒绝任何不是Unicode字母或数字的字符(由unicode.IsLetter()unicode.IsDigit()定义):

if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
    return false
}

'\u0000'不是预定义的允许字符的一部分,而且现在你可以猜到,它既不是字母也不是数字:

// 以下代码输出"INVALID":
c := '\u0000'
if !unicode.IsLetter(c) && !unicode.IsDigit(c) {
    fmt.Println("INVALID")
}

由于isValidTag()返回falsename(即json键的值,不包括"标签选项"部分)将被丢弃(name = "")并且不会被使用。因此,在包含Unicode零字符的结构字段中将找不到匹配项。

作为替代方案,可以使用map、自定义的json.Unmarshaler或使用json.RawMessage

但是我强烈不建议使用这样丑陋的JSON键。我理解你可能只是尝试解析这样的JSON响应,而且可能超出了你的控制范围,但是你应该努力避免使用这些键,因为它们只会在以后引起更多问题(例如,如果存储在数据库中,检查记录时很难发现其中有'\u0000'字符,因为它们可能显示为空)。

英文:

Short answer: With the current json implementation it is not possible using only struct tags.

Note: It's an implementation restriction, not a specification restriction. (It's the restriction of the json package implementation, not the restriction of the struct tags specification.)


Some background: you specified your tags with a raw string literal:

> The value of a raw string literal is the string composed of the uninterpreted (implicitly UTF-8-encoded) characters between the quotes...

So no unescaping or unquoting happens in the content of the raw string literal by the compiler.

The convention for struct tag values quoted from reflect.StructTag:

> By convention, tag strings are a concatenation of optionally space-separated key:"value" pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ' '), quote (U+0022 '"'), and colon (U+003A ':'). Each value is quoted using U+0022 '"' characters and Go string literal syntax.

What this means is that by convention tag values are a list of (key:"value") pairs separated by spaces. There are quite a few restrictions for keys, but values may be anything, and values (should) use "Go string literal syntax", this means that these values will be unquoted at runtime from code (by a call to strconv.Unquote(), called from StructTag.Get(), in source file reflect/type.go, currently line #809).

So no need for double quoting. See your simplified example:

type Response1 struct {
	Page   int
	Fruits []string
	Msg    interface{} `json:&quot;\u0000_abc&quot;`
}

Now the following code:

t := reflect.TypeOf(Response1{})
fmt.Printf(&quot;%#v\n&quot;, t.Field(2).Tag)
fmt.Printf(&quot;%#v\n&quot;, t.Field(2).Tag.Get(&quot;json&quot;))

Prints:

&quot;json:\&quot;\\u0000_abc\&quot;&quot;
&quot;\x00_abc&quot;

As you can see, the value part for the json key is &quot;\x00_abc&quot; so it properly contains the zero character.

But how will the json package use this?

The json package uses the value returned by StructTag.Get() (from the reflect package), exactly what we did. You can see it in the json/encode.go source file, typeFields() function, currently line #1032. So far so good.

Then it calls the unexported json.parseTag() function, in json/tags.go source file, currently line #17. This cuts the part after the comma (which becomes the "tag options").

And finally json.isValidTag() function is called with the previous value, in source file json/encode.go, currently line #731. This function checks the runes of the passed string, and (besides a set of pre-defined allowed characters &quot;!#$%&amp;()*+-./:&lt;=&gt;?@[]^_{|}~ &quot;) rejects everything that is not a unicode letter or digit (as defined by unicode.IsLetter() and unicode.IsDigit()):

if !unicode.IsLetter(c) &amp;&amp; !unicode.IsDigit(c) {
    return false
} 

&#39;\u0000&#39; is not part of the pre-defined allowed characters, and as you can guess now, it is neither a letter nor a digit:

// Following code prints &quot;INVALID&quot;:
c := &#39;\u0000&#39;
if !unicode.IsLetter(c) &amp;&amp; !unicode.IsDigit(c) {
	fmt.Println(&quot;INVALID&quot;)
}

And since isValidTag() returns false, the name (which is the value for the json key, without the "tag options" part) will be discarded (name = &quot;&quot;) and not used. So no match will be found for the struct field containing a unicode zero.

For an alternative solution use a map, or a custom json.Unmarshaler or use json.RawMessage.

But I would highly discourage using such ugly json keys. I understand likely you are just trying to parse such json response and it may be out of your reach, but you should fight against using these keys as they will just cause more problems later on (e.g. if stored in db, by inspecting records it will be very hard to spot that there are &#39;\u0000&#39; characters in them as they may be displayed as nothing).

答案2

得分: 0

你不能以这种方式进行操作,原因是:http://golang.org/ref/spec#Struct_types

但是你可以将其反序列化为map[string]interface{},然后通过regexp检查该对象的字段名。

英文:

You cannot do in such way due to: http://golang.org/ref/spec#Struct_types

But You can unmarshal to map[string]interface{} then check field names of that object through regexp.

答案3

得分: 0

我不认为这是通过结构标签实现的。你能做的最好的事情是将其解组为map[string]interface{},然后手动获取值:

var b = []byte(`{"\u0000abc":42}`)
var m map[string]interface{}
err := json.Unmarshal(b, &m)
if err != nil {
    panic(err)
}
fmt.Println(m, m["\x00abc"])

Playground: http://play.golang.org/p/RtS7Nst0d7

英文:

I don't think this is possible with struct tags. The best thing you can do is unmarshal it into map[string]interface{} and then get the values manually:

var b = []byte(`{&quot;\u0000abc&quot;:42}`)
var m map[string]interface{}
err := json.Unmarshal(b, &amp;m)
if err != nil {
	panic(err)
}
fmt.Println(m, m[&quot;\x00abc&quot;])

Playground: http://play.golang.org/p/RtS7Nst0d7.

huangapple
  • 本文由 发表于 2015年9月8日 16:56:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/32453410.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定