Why does json.Unmarshal need a pointer to a map, if a map is a reference type?

huangapple go评论91阅读模式
英文:

Why does json.Unmarshal need a pointer to a map, if a map is a reference type?

问题

我正在使用json.Unmarshal进行工作,并遇到了以下奇怪的问题。当运行下面的代码时,我收到错误消息json: Unmarshal(non-pointer map[string]string)

func main() {
    m := make(map[string]string)
    data := `{"foo": "bar"}`
    err := json.Unmarshal([]byte(data), m)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(m)
}

查看json.Unmarshal文档,似乎没有指明需要传递指针。我能找到的最接近的是以下这行:

Unmarshal解析JSON编码的数据,并将结果存储在v指向的值中。

关于Unmarshal在映射中的协议,相关的行也不清楚,因为它没有提到指针。

要将JSON对象解组为映射,Unmarshal首先建立要使用的映射。如果映射为nil,则Unmarshal分配一个新的映射。否则,Unmarshal重用现有的映射,保留现有的条目。然后,Unmarshal将JSON对象中的键值对存储到映射中。映射的键类型必须是字符串、整数或实现encoding.TextUnmarshaler接口。

为什么我必须传递一个指针给json.Unmarshal,特别是因为映射已经是引用类型了?我知道如果我将映射传递给一个函数,并向映射添加数据,映射的底层数据将会被更改(参见以下playground示例),这意味着传递一个映射的指针应该没有关系。有人能解释清楚这个问题吗?

英文:

I was working with json.Unmarshal and came across the following quirk. When running the below code, I get the error json: Unmarshal(non-pointer map[string]string)

func main() {
	m := make(map[string]string)
	data := `{"foo": "bar"}`
	err := json.Unmarshal([]byte(data), m)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println(m)
}

Playground

Looking at the documentation for json.Unmarshal, there is seemingly no indication that a pointer is required. The closest I can find is the following line

> Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v.

The lines regarding the protocol Unmarshal follows for maps are similarly unclear, as it makes no reference to pointers.

>To unmarshal a JSON object into a map, Unmarshal first establishes a map to use. If the map is nil, Unmarshal allocates a new map. Otherwise Unmarshal reuses the existing map, keeping existing entries. Unmarshal then stores key-value pairs from the JSON object into the map. The map's key type must either be a string, an integer, or implement encoding.TextUnmarshaler.

Why must I pass a pointer to json.Unmarshal, especially if maps are already reference types? I know that if I pass a map to a function, and add data to the map, the underlying data of the map will be changed (see the following playground example), which means that it shouldn't matter if I pass a pointer to a map. Can someone clear this up?

答案1

得分: 24

根据文档中的说明:

Unmarshal 使用与 Marshal 相反的编码方式,根据需要分配映射、切片和指针...

Unmarshal 可能会分配变量(映射、切片等)。如果我们将 map 传递给 map指针,那么新分配的 map 对调用者是不可见的。以下示例(Go Playground)演示了这一点:

package main

import (
    "fmt"
)

func mapFunc(m map[string]interface{}) {
    m = make(map[string]interface{})
    m["abc"] = "123"
}

func mapPtrFunc(mp *map[string]interface{}) {
    m := make(map[string]interface{})
    m["abc"] = "123"

    *mp = m
}

func main() {
    var m1, m2 map[string]interface{}
    mapFunc(m1)
    mapPtrFunc(&m2)
    
    fmt.Printf("%+v, %+v\n", m1, m2)
}

输出结果为:

map[], map[abc:123]

如果要求函数/方法在必要时分配变量,并且新分配的变量需要对调用者可见,则解决方案为:(a)变量必须在函数的 返回 语句中 或者(b)变量可以分配给函数/方法的参数。由于在 go一切 都是按值传递的,在情况(b)中,参数必须是一个 指针。以下图示说明了上面示例中发生的情况:

Why does json.Unmarshal need a pointer to a map, if a map is a reference type?

  1. 首先,m1m2 都指向 nil
  2. 调用 mapFuncm1 指向的值复制到 m,结果 m 也指向 nil 的映射。
  3. 如果在(1)中映射已经分配,则在(2)中将 m1 指向的 底层映射数据结构 的地址(不是 m1 的地址)复制到 m。在这种情况下,m1m 都指向同一个 映射数据结构,因此通过 m1 修改映射项也将对 m可见的
  4. mapFunc 函数中,分配了新的映射并将其分配给 m。无法将其分配给 m1

对于指针的情况:

  1. 调用 mapPtrFunc 时,m2 的地址将被复制到 mp
  2. mapPtrFunc 中,新的映射被分配并分配给 *mp(而不是 mp)。由于 mp 是指向 m2 的指针,将新的映射分配给 *mp 将更改 m2 指向的值。请注意,mp 的值不变,即 m2 的地址。
英文:

As stated in the documentation:

> Unmarshal uses the inverse of the encodings that Marshal uses, allocating maps, slices, and pointers as necessary, with ...

Unmarshal may allocates the variable(map, slice, etc.). If we pass a map instead of pointer to a map, then the newly allocated map won't be visible to the caller. The following examples (Go Playground) demonstrates this:

package main

import (
    "fmt"
)

func mapFunc(m map[string]interface{}) {
    m = make(map[string]interface{})
    m["abc"] = "123"
}

func mapPtrFunc(mp *map[string]interface{}) {
    m := make(map[string]interface{})
    m["abc"] = "123"

    *mp = m
}

func main() {
    var m1, m2 map[string]interface{}
    mapFunc(m1)
    mapPtrFunc(&m2)
    
    fmt.Printf("%+v, %+v\n", m1, m2)
}

in which the output is:

map[], map[abc:123]

If the requirement says that a function/method may allocate a variable when necessary and the newly allocated variable need to be visible to the caller, the solution will be: (a) the variable must be in function's return statement or (b) the variable can be assigned to the function/method argument. Since in go everything is pass by value, in case of (b), the argument must be a pointer. The following diagram illustrates what happen in the above example:

Why does json.Unmarshal need a pointer to a map, if a map is a reference type?

  1. At first, both map m1 and m2 point to nil.
  2. Calling mapFunc will copy the value pointed by m1 to m resulting m will also point to nil map.
  3. If in (1) the map already allocated, then in (2) the address of underlying map data structure pointed by m1 (not the address of m1) will be copied to m. In this case both m1 and m point to the same map data structure, thus modifying map items through m1 will also be visible to m.
  4. In the mapFunc function, new map is allocated and assigned to m. There is no way to assign it to m1.

In case of pointer:

  1. When calling mapPtrFunc, the address of m2 will be copied to mp.
  2. In the mapPtrFunc, new map is allocated and assigned to *mp (not mp). Since mp is pointer to m2, assigning the new map to *mp will change the value pointed by m2. Note that the value of mp is unchanged, i.e. the address of m2.

答案2

得分: 2

文档的另一个关键部分是这样的:

为了将 JSON 反序列化为指针,Unmarshal 首先处理 JSON 是 JSON 字面值 null 的情况。在这种情况下,Unmarshal 将指针设置为 nil。否则,Unmarshal 将 JSON 反序列化为指针所指向的值。如果指针为 nil,Unmarshal 会为其分配一个新值。

如果 Unmarshal 接受一个 map,那么无论 JSON 是 null 还是 {},它都必须将 map 保持在相同的状态。但是通过使用指针,指针被设置为 nil 和指向空 map 之间现在有了区别。

请注意,为了使 Unmarshal 能够“将指针设置为 nil”,实际上需要将指向您的 map 指针的指针传递进去:

package main

import (
	"encoding/json"
	"fmt"
	"log"
)

func main() {
	var m *map[string]string
	data := `{}`
	err := json.Unmarshal([]byte(data), &m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)

	data = `null`
	err = json.Unmarshal([]byte(data), &m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)

	data = `{"foo": "bar"}`
	err = json.Unmarshal([]byte(data), &m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)
}

这将输出:

&map[]
<nil>
&map[foo:bar]
英文:

The other key part of the documentation is this:

> To unmarshal JSON into a pointer, Unmarshal first handles the case of
> the JSON being the JSON literal null. In that case, Unmarshal sets the
> pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the
> value pointed at by the pointer. If the pointer is nil, Unmarshal
> allocates a new value for it to point to.

If Unmarshall accepted a map, it would have to leave the map in the same state whether the JSON were null or {}. But by using pointers, there's now a difference between the pointer being set to nil and it pointing to an empty map.

Note that in order for Unmarshall to be able to "set the pointer to nil", you actually need to pass in a pointer to your map pointer:

package main

import (
	&quot;encoding/json&quot;
	&quot;fmt&quot;
    &quot;log&quot;
)

func main() {
	var m *map[string]string
	data := `{}`
	err := json.Unmarshal([]byte(data), &amp;m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)

	data = `null`
	err = json.Unmarshal([]byte(data), &amp;m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)

	data = `{&quot;foo&quot;: &quot;bar&quot;}`
	err = json.Unmarshal([]byte(data), &amp;m)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println(m)
}

This outputs:

&amp;map[]
&lt;nil&gt;
&amp;map[foo:bar]

答案3

得分: 1

你的观点与说“切片只是一个指针”的说法没有什么不同。切片(和映射)使用指针使它们变得轻量级,但仍然有其他使它们工作的因素。例如,切片包含有关其长度和容量的信息。

至于为什么会发生这种情况,从代码的角度来看,json.Unmarshal 的最后一行调用了 d.unmarshal(),它执行了 decode.go 中 176-179 行的代码。它基本上说“如果值不是指针,或者是 nil,则返回 InvalidUnmarshalError”。

文档可能对这些事情的解释可以更清晰一些,但请考虑以下几点:

  1. 如果你不传递指向映射的指针,那么如何将 JSON 的 null 值分配给映射作为 nil?如果你需要修改映射本身(而不是映射中的项),那么将指向需要修改的项的指针传递给它是有意义的。在这种情况下,就是映射本身。
  2. 或者,假设你将一个 nil 映射传递给 json.Unmarshal。在 json.Unmarshal 使用的代码最终调用类似于 make(map[string]string) 的代码之后,值将按需进行解组。然而,你的函数中仍然有一个 nil 映射,因为你的映射指向了空。除了传递映射的指针之外,没有其他方法可以解决这个问题。

然而,假设不需要传递映射的地址,因为“它已经是一个指针”,并且你已经初始化了映射,所以它不是 nil。那么会发生什么呢?如果我通过将第 176 行更改为 if rv.Kind() != reflect.Map && rv.Kind() != reflect.Ptr || rv.IsNil() { 来绕过我之前链接的代码行中的测试,那么会发生以下情况:

`{&quot;foo&quot;:&quot;bar&quot;}`: false map[foo:bar]
`{}`: false map[]
`null`: panic: reflect: reflect.Value.Set using unaddressable value [recovered]
	panic: interface conversion: string is not error: missing method Error

goroutine 1 [running]:
json.(*decodeState).unmarshal.func1(0xc420039e70)
	/home/kit/jstest/src/json/decode.go:172 +0x99
panic(0x4b0a00, 0xc42000e410)
	/usr/lib/go/src/runtime/panic.go:489 +0x2cf
reflect.flag.mustBeAssignable(0x15)
	/usr/lib/go/src/reflect/value.go:228 +0xf9
reflect.Value.Set(0x4b8b00, 0xc420012300, 0x15, 0x4b8b00, 0x0, 0x15)
	/usr/lib/go/src/reflect/value.go:1345 +0x2f
json.(*decodeState).literalStore(0xc420084360, 0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x15, 0xc420000100)
	/home/kit/jstest/src/json/decode.go:883 +0x2797
json.(*decodeState).literal(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
	/home/kit/jstest/src/json/decode.go:799 +0xdf
json.(*decodeState).value(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
	/home/kit/jstest/src/json/decode.go:405 +0x32e
json.(*decodeState).unmarshal(0xc420084360, 0x4b8b00, 0xc420012300, 0x0, 0x0)
	/home/kit/jstest/src/json/decode.go:184 +0x224
json.Unmarshal(0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x8, 0x0)
	/home/kit/jstest/src/json/decode.go:104 +0x148
main.main()
	/home/kit/jstest/src/jstest/main.go:16 +0x1af

导致该输出的代码:

package main

// 注意,"json" 是我修改的 "encoding/json" 源码的本地副本。
import (
	"fmt"
	"json"
)

func main() {
	for _, data := range []string{
		`{&quot;foo&quot;:&quot;bar&quot;}`,
		`{}`,
		`null`,
	} {
		m := make(map[string]string)
		fmt.Printf(&quot;%#q: &quot;, data)
		if err := json.Unmarshal([]byte(data), m); err != nil {
			fmt.Println(err)
		} else {
			fmt.Println(m == nil, m)
		}
	}
}

关键在于这里:

reflect.Value.Set using unaddressable value

因为你传递了映射的副本,它是不可寻址的(即从低级机器的角度来看,它具有临时地址或甚至没有地址)。我知道一种解决方法(使用 reflect 包),但它实际上并没有解决问题;你正在创建一个无法返回给调用者的本地指针,并在使用它时替代了原始存储位置!

所以现在尝试使用指针:

		if err := json.Unmarshal([]byte(data), m); err != nil {
			fmt.Println(err)
		} else {
			fmt.Println(m == nil, m)
		}

输出:

`{&quot;foo&quot;:&quot;bar&quot;}`: false map[foo:bar]
`{}`: false map[]
`null`: true map[]

现在它可以工作了。底线是,如果对象本身可能被修改(并且文档中说可能会被修改,例如在期望对象或数组(映射或切片)的位置使用 null 时),请使用指针。

英文:

Your viewpoint is no different than saying "a slice is nothing but a pointer". Slices (and maps) use pointers to make them lightweight, yes, but there are still more things that make them work. A slice contains info about its length and capacity for example.

As for why this happens, from a code perspective, the last line of json.Unmarshal calls d.unmarshal(), which executes the code in lines 176-179 of decode.go. It basically says "if the value isn't a pointer, or is nil, return an InvalidUnmarshalError."

The docs could probably be clearer about things, but consider a couple of things:

  1. How would the JSON null value be assigned to the map as nil if you don't pass a pointer to the map? If you require the ability to modify the map itself (rather than the items in the map), then it makes sense to pass a pointer to the item that needs modified. In this case, it's the map.
  2. Alternately, suppose you passed a nil map to json.Unmarshal. Values will be unmarshaled as necessary after the code json.Unmarshal uses eventually calls the equivalent of make(map[string]string). However, you still have a nil map in your function because your map pointed to nothing. There's no way to fix this other than to pass a pointer to the map.

However, let's say there was no need to pass the address of your map because "it's already a pointer", and you've already initialized the map, so it's not nil. What happens then? Well, if I bypass the test in the lines I linked earlier by changing line 176 to read if rv.Kind() != reflect.Map &amp;&amp; rv.Kind() != reflect.Ptr || rv.IsNil() {, then this can happen:

`{&quot;foo&quot;:&quot;bar&quot;}`: false map[foo:bar]
`{}`: false map[]
`null`: panic: reflect: reflect.Value.Set using unaddressable value [recovered]
	panic: interface conversion: string is not error: missing method Error

goroutine 1 [running]:
json.(*decodeState).unmarshal.func1(0xc420039e70)
	/home/kit/jstest/src/json/decode.go:172 +0x99
panic(0x4b0a00, 0xc42000e410)
	/usr/lib/go/src/runtime/panic.go:489 +0x2cf
reflect.flag.mustBeAssignable(0x15)
	/usr/lib/go/src/reflect/value.go:228 +0xf9
reflect.Value.Set(0x4b8b00, 0xc420012300, 0x15, 0x4b8b00, 0x0, 0x15)
	/usr/lib/go/src/reflect/value.go:1345 +0x2f
json.(*decodeState).literalStore(0xc420084360, 0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x15, 0xc420000100)
	/home/kit/jstest/src/json/decode.go:883 +0x2797
json.(*decodeState).literal(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
	/home/kit/jstest/src/json/decode.go:799 +0xdf
json.(*decodeState).value(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
	/home/kit/jstest/src/json/decode.go:405 +0x32e
json.(*decodeState).unmarshal(0xc420084360, 0x4b8b00, 0xc420012300, 0x0, 0x0)
	/home/kit/jstest/src/json/decode.go:184 +0x224
json.Unmarshal(0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x8, 0x0)
	/home/kit/jstest/src/json/decode.go:104 +0x148
main.main()
	/home/kit/jstest/src/jstest/main.go:16 +0x1af

Code leading to that output:

package main

// Note &quot;json&quot; is the local copy of the &quot;encoding/json&quot; source that I modified.
import (
	&quot;fmt&quot;
	&quot;json&quot;
)

func main() {
	for _, data := range []string{
		`{&quot;foo&quot;:&quot;bar&quot;}`,
		`{}`,
		`null`,
	} {
		m := make(map[string]string)
		fmt.Printf(&quot;%#q: &quot;, data)
		if err := json.Unmarshal([]byte(data), m); err != nil {
			fmt.Println(err)
		} else {
			fmt.Println(m == nil, m)
		}
	}
}

The key is this bit here:

reflect.Value.Set using unaddressable value

Because you passed a copy of the map, it's unaddressable (i.e. it has a temporary address or even no address from the low-level machine perspective). I know of one way around this (x := new(Type) followed by *x = value, except using the reflect package), but it doesn't actually solve the problem; you're creating a local pointer that can't be returned to the caller and using it instead of your original storage location!

So now try a pointer:

		if err := json.Unmarshal([]byte(data), m); err != nil {
			fmt.Println(err)
		} else {
			fmt.Println(m == nil, m)
		}

Output:

`{&quot;foo&quot;:&quot;bar&quot;}`: false map[foo:bar]
`{}`: false map[]
`null`: true map[]

Now it works. Bottom line: use pointers if the object itself may be modified (and the docs say it might be, e.g. if null is used where an object or array (map or slice) is expected.

huangapple
  • 本文由 发表于 2017年7月16日 04:33:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/45122496.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定