Unmarshalling `time.Time` from JSON fails when escaping '+' as `\u002b` in files but works in plain strings: cannot parse "\\u002b00:00\"" as "Z07:00"

huangapple go评论100阅读模式
英文:

Unmarshalling `time.Time` from JSON fails when escaping '+' as `\u002b` in files but works in plain strings: cannot parse "\\u002b00:00\"" as "Z07:00"

问题

问题:为什么从文件读取时,解组时间字段失败,但从相同的字符串读取的相同 JSON 却成功了?

这是因为在从文件读取 JSON 时,时间字段中的转义字符"+"无法正确解析。在 JSON 字符串中,"+"字符被转义为"\u002b",但在从文件中读取时,转义字符没有被正确处理,导致解组失败。

要解决这个问题,你可以尝试使用json.Decoder来解析文件内容,而不是使用json.Unmarshal函数。json.Decoder提供了更灵活的解析方式,可以处理转义字符。

以下是修改后的代码示例:

func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
	fileName := "test.json"
	fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
	if readErr != nil {
		t.Fatalf("Could not read file %s: %v", fileName, readErr)
	}
	if fileContent == nil {
		t.Fatalf("File %s must not be empty", fileName)
	}

	var slice []AStructWithTime
	decoder := json.NewDecoder(bytes.NewReader(fileContent))
	decoder.DisallowUnknownFields() // Optional: Disallow unknown fields in the JSON
	decodeErr := decoder.Decode(&slice)
	if decodeErr != nil {
		t.Fatalf("Could not decode file content %s: %v", fileName, decodeErr)
	}

	for index, instance := range slice {
		if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
			t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
		}
	}
}

通过使用json.Decoder,你可以避免转义字符的问题,并且能够更好地处理文件中的 JSON 内容。

英文:

I'm unmarshalling into a struct that has a time.Time field named Foo:

type AStructWithTime struct {
	Foo time.Time `json:"foo"`
}

My expectation is, that after unmarshalling I get something like this:

var expectedStruct = AStructWithTime{
	Foo: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}

Working Example 1: Plain JSON Objects into Structs

This works fine when working with plain json strings:

func Test_Unmarshalling_DateTime_From_String(t *testing.T) {
	jsonStrings := []string{
		"{\"foo\": \"2022-09-26T21:00:00Z\"}",           // trailing Z = UTC offset
		"{\"foo\": \"2022-09-26T21:00:00+00:00\"}",      // explicit zero offset
		"{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}", // \u002b is an escaped '+'
	}
	for _, jsonString := range jsonStrings {
		var deserializedStruct AStructWithTime
		err := json.Unmarshal([]byte(jsonString), &deserializedStruct)
		if err != nil {
			t.Fatalf("Could not unmarshal '%s': %v", jsonString, err) // doesn't happen
		}
		if deserializedStruct.Foo.Unix() != expectedStruct.Foo.Unix() {
			t.Fatal("Unmarshalling is erroneous") // doesn't happen
		}
		// works; no errors
	}
}

Working Example 2: JSON Array into Slice

It also works, if I unmarshal the same objects from a json array into a slice:

func Test_Unmarshalling_DateTime_From_Array(t *testing.T) {
	// these are just the same objects as above, just all in one array instead of as single objects/dicts
	jsonArrayString := "[{\"foo\": \"2022-09-26T21:00:00Z\"},{\"foo\": \"2022-09-26T21:00:00+00:00\"},{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}]"
	var slice []AStructWithTime // and now I need to unmarshal into a slice
	unmarshalErr := json.Unmarshal([]byte(jsonArrayString), &slice)
	if unmarshalErr != nil {
		t.Fatalf("Could not unmarshal array: %v", unmarshalErr)
	}
	for index, instance := range slice {
		if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
			t.Fatalf("Unmarshalling failed for index %v: Expected %v but got %v", index, expectedStruct.Foo, instance.Foo)
		}
	}
    // works; no errors
}

Not Working Example

Now I do the same unmarshalling with a JSON read from a file "test.json". Its content is the array from the working example above:

[
  {
    "foo": "2022-09-26T21:00:00Z"
  },
  {
    "foo": "2022-09-26T21:00:00+00:00"
  },
  {
    "foo": "2022-09-26T21:00:00\u002b00:00"
  }
]

The code is:

func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
	fileName := "test.json"
	fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
	if readErr != nil {
		t.Fatalf("Could not read file %s: %v", fileName, readErr)
	}
	if fileContent == nil {
		t.Fatalf("File %s must not be empty", fileName)
	}
	var slice []AStructWithTime
	unmarshalErr := json.Unmarshal(fileContent, &slice)
	if unmarshalErr != nil {
        // ERROR HAPPENS HERE
		// Could not unmarshal file content test.json: parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
		t.Fatalf("Could not unmarshal file content %s: %v", fileName, unmarshalErr)
	}
	for index, instance := range slice {
		if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
			t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
		}
	}
}

It fails because of the escaped '+'.
> parsing time ""2022-09-26T21:00:00\u002b00:00"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "\u002b00:00"" as "Z07:00"

Question: Why does unmarshalling the time.Time field fail when it's being read from a file but works when the same json is read from an identical string?

答案1

得分: 5

我认为这是encoding/json中的一个错误。

JSON语法在https://www.json.org和JSON的IETF定义RFC 8259,第7节:字符串中都规定了JSON字符串可以包含Unicode转义序列:

7. 字符串

字符串的表示类似于C系列编程语言中使用的约定。字符串以引号开始和结束。除了必须转义的字符(引号、反斜杠和控制字符(U+0000到U+001F))之外,所有Unicode字符都可以放在引号中。

任何字符都可以转义。如果字符在基本多语言平面(U+0000到U+FFFF)中,则可以表示为六个字符序列:反斜杠,后跟小写字母u,后跟编码字符的四个十六进制数字。十六进制字母A到F可以是大写或小写。例如,只包含单个反斜杠字符的字符串可以表示为"\u005C"。

...

要转义不在基本多语言平面中的扩展字符,该字符表示为12个字符序列,编码UTF-16代理对。例如,只包含G谱号字符(U+1D11E)的字符串可以表示为"\uD834\uDD1E"。


string = quotation-mark *char quotation-mark

char = unescaped /
       escape (
          %x22 /          ; "    quotation mark  U+0022
          %x5C /          ; \    reverse solidus U+005C
          %x2F /          ; /    solidus         U+002F
          %x62 /          ; b    backspace       U+0008
          %x66 /          ; f    form feed       U+000C
          %x6E /          ; n    line feed       U+000A
          %x72 /          ; r    carriage return U+000D
          %x74 /          ; t    tab             U+0009
          %x75 4HEXDIG )  ; uXXXX                U+XXXX

escape = %x5C              ; \

quotation-mark = %x22      ; "

unescaped = %x20-21 / %x23-5B / %x5D-10FFFF

原始帖子中的JSON文档

{
  "foo": "2022-09-26T21:00:00\u002b00:00"
}	

在Node.js中使用JSON.parse()进行解析和反序列化时完全正常。

以下是演示错误的示例:

package main

import (
	"encoding/json"
	"fmt"
	"time"
)

var document []byte = []byte(`
{
  "value": "2022-09-26T21:00:00\u002b00:00"
}
`)

func main() {

	deserializeJsonAsTime()

	deserializeJsonAsString()

}

func deserializeJsonAsTime() {
	fmt.Println("")
	fmt.Println("Deserializing JSON as time.Time ...")

	type Widget struct {
		Value time.Time `json:"value"`
	}

	expected := Widget{
		Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
	}
	actual := Widget{}
	err := json.Unmarshal(document, &actual)

	switch {
	case err != nil:
		fmt.Println("Error deserializing JSON as time.Time")
		fmt.Println(err)
	case actual.Value != expected.Value:
		fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
	default:
		fmt.Println("Success")
	}

}

func deserializeJsonAsString() {
	fmt.Println("")
	fmt.Println("Deserializing JSON as string ...")

	type Widget struct {
		Value string `json:"value"`
	}

	expected := Widget{
		Value: "2022-09-26T21:00:00+00:00",
	}
	actual := Widget{}
	err := json.Unmarshal(document, &actual)

	switch {
	case err != nil:
		fmt.Println("Error deserializing JSON as string")
		fmt.Println(err)
	case actual.Value != expected.Value:
		fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
	default:
		fmt.Println("Success")
	}

}

当运行时(参见https://goplay.tools/snippet/fHQQVJ8GfPp),我们得到:

Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
Deserializing JSON as string ...
Success

由于将包含Unicode转义序列的JSON字符串反序列化为string会产生正确/预期的结果-转义序列被转换为预期的符文/字节序列-问题似乎出现在处理反序列化为time.Time的代码中(它似乎不会将其反序列化为字符串,然后将字符串值解析为time.Time)。

英文:

I believe that this is a bug in encoding/json.

Both the JSON grammar at https://www.json.org and the IETF definition of JSON at RFC 8259, Section 7: Strings provide that a JSON string may contain Unicode escape sequences:

> 7. Strings
>
> The representation of strings is similar to conventions used in the C
> family of programming languages. A string begins and ends with quotation
> marks. All Unicode characters may be placed within the quotation marks,
> except for the characters that MUST be escaped: quotation mark, reverse
> solidus, and the control characters (U+0000 through U+001F).
>
> Any character may be escaped. If the character is in the Basic
> Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a
> six-character sequence: a reverse solidus, followed by the lowercase letter
> u, followed by four hexadecimal digits that encode the character's code
> point. The hexadecimal letters A through F can be uppercase or lowercase.
> So, for example, a string containing only a single reverse solidus
> character may be represented as "\u005C".
>
> . . .
>
> To escape an extended character that is not in the Basic Multilingual
> Plane, the character is represented as a 12-character sequence, encoding
> the UTF-16 surrogate pair. So, for example, a string containing only the
> G-clef character (U+1D11E) may be represented as "\uD834\uDD1E".
>
> lang-none
>
> string = quotation-mark *char quotation-mark
>
> char = unescaped /
> escape (
> %x22 / ; " quotation mark U+0022
> %x5C / ; \ reverse solidus U+005C
> %x2F / ; / solidus U+002F
> %x62 / ; b backspace U+0008
> %x66 / ; f form feed U+000C
> %x6E / ; n line feed U+000A
> %x72 / ; r carriage return U+000D
> %x74 / ; t tab U+0009
> %x75 4HEXDIG ) ; uXXXX U+XXXX
>
> escape = %x5C ; \
>
> quotation-mark = %x22 ; "
>
> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>
>

The JSON document from the original post

{
  "foo": "2022-09-26T21:00:00\u002b00:00"
}	

Parses and deserializes perfectly fine in Node.js using JSON.parse().

Here's an example demonstrating the bug:

package main
import (
"encoding/json"
"fmt"
"time"
)
var document []byte = []byte(`
{
"value": "2022-09-26T21:00:00\u002b00:00"
}
`)
func main() {
deserializeJsonAsTime()
deserializeJsonAsString()
}
func deserializeJsonAsTime() {
fmt.Println("")
fmt.Println("Deserializing JSON as time.Time ...")
type Widget struct {
Value time.Time `json: "value"`
}
expected := Widget{
Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as time.Time")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}
func deserializeJsonAsString() {
fmt.Println("")
fmt.Println("Deserializing JSON as string ...")
type Widget struct {
Value string `json: "value"`
}
expected := Widget{
Value: "2022-09-26T21:00:00+00:00",
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as string")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}

When run — see https://goplay.tools/snippet/fHQQVJ8GfPp — we get:

Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
Deserializing JSON as string ...
Sucess

Since deserializing a JSON string containing Unicode escape sequences as a string yields the correct/expected result — the escape sequence being turned into the expected rune/byte sequence — the problem seemingly lies in the code that handles the deserialization to time.Time (It does not appear to deserialize to a string and then parse the string value as a time.Time.

答案2

得分: 0

这个用户指出,这是一个问题:time: UnmarshalJSON does not respect escaped unicode characters。我们可以通过以下方式解决json.Unmarshal时出现的两个错误,将字符串{"value": "2022-09-26T21:00:00\u002b00:00"}转换为JSON。

  • JSON在将'+'转义为'\u002b'时失败

    • 解决方案:通过strconv.Unquote将转义的Unicode转换为UTF-8
  • 无法将"\\u002b00:00\""解析为"Z07:00"

    • 解决方案:使用格式"2006-01-02T15:04:05-07:00"解析时间

为了使其与json.Unmarshal兼容,我们可以定义一个新类型utf8Time

type utf8Time struct {
	time.Time
}

func (t *utf8Time) UnmarshalJSON(data []byte) error {
	str, err := strconv.Unquote(string(data))
	if err != nil {
		return err
	}
	tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
	if err != nil {
		return err
	}
	*t = utf8Time{tmpT}
	return nil
}

func (t utf8Time) String() string {
	return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}

然后进行json.Unmarshal操作。

type MyDoc struct {
	Value utf8Time `json:"value"`
}

var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)

func main() {
	var mydoc MyDoc
	err := json.Unmarshal(document, &mydoc)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(mydoc.Value)
}

输出结果为:

2022-09-26 21:00:00 +0000 +0000
英文:

As Brits point out this is one issue time: UnmarshalJSON does not respect escaped unicode characters. We could solve those two errors when json.Unmarshal to the string {"value": "2022-09-26T21:00:00\u002b00:00"} in this way.

  • JSON fails when escaping '+' as '\u002b'

    • Solution: Converting escaped unicode to utf8 through strconv.Unquote
  • cannot parse "\\u002b00:00\"" as "Z07:00"

    • Solution: parse time with this format "2006-01-02T15:04:05-07:00"

In order to make it compatible with json.Unmarshal, we could define one new type utf8Time

type utf8Time struct {
	time.Time
}

func (t *utf8Time) UnmarshalJSON(data []byte) error {
	str, err := strconv.Unquote(string(data))
	if err != nil {
		return err
	}
	tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
	if err != nil {
		return err
	}
	*t = utf8Time{tmpT}
	return nil
}

func (t utf8Time) String() string {
	return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}

Then to do the json.Unmarshal

type MyDoc struct {
	Value utf8Time `json:"value"`
}

var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)

func main() {
	var mydoc MyDoc
	err := json.Unmarshal(document, &mydoc)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Println(mydoc.Value)
}

Output

2022-09-26 21:00:00 +0000 +0000

huangapple
  • 本文由 发表于 2022年9月27日 06:12:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/73860458.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定