英文:
Unmarshalling `time.Time` from JSON fails when escaping '+' as `\u002b` in files but works in plain strings: cannot parse "\\u002b00:00\"" as "Z07:00"
问题
问题:为什么从文件读取时,解组时间字段失败,但从相同的字符串读取的相同 JSON 却成功了?
这是因为在从文件读取 JSON 时,时间字段中的转义字符"+"无法正确解析。在 JSON 字符串中,"+"字符被转义为"\u002b",但在从文件中读取时,转义字符没有被正确处理,导致解组失败。
要解决这个问题,你可以尝试使用json.Decoder
来解析文件内容,而不是使用json.Unmarshal
函数。json.Decoder
提供了更灵活的解析方式,可以处理转义字符。
以下是修改后的代码示例:
func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
fileName := "test.json"
fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
if readErr != nil {
t.Fatalf("Could not read file %s: %v", fileName, readErr)
}
if fileContent == nil {
t.Fatalf("File %s must not be empty", fileName)
}
var slice []AStructWithTime
decoder := json.NewDecoder(bytes.NewReader(fileContent))
decoder.DisallowUnknownFields() // Optional: Disallow unknown fields in the JSON
decodeErr := decoder.Decode(&slice)
if decodeErr != nil {
t.Fatalf("Could not decode file content %s: %v", fileName, decodeErr)
}
for index, instance := range slice {
if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
}
}
}
通过使用json.Decoder
,你可以避免转义字符的问题,并且能够更好地处理文件中的 JSON 内容。
英文:
I'm unmarshalling into a struct that has a time.Time
field named Foo:
type AStructWithTime struct {
Foo time.Time `json:"foo"`
}
My expectation is, that after unmarshalling I get something like this:
var expectedStruct = AStructWithTime{
Foo: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
Working Example 1: Plain JSON Objects into Structs
This works fine when working with plain json strings:
func Test_Unmarshalling_DateTime_From_String(t *testing.T) {
jsonStrings := []string{
"{\"foo\": \"2022-09-26T21:00:00Z\"}", // trailing Z = UTC offset
"{\"foo\": \"2022-09-26T21:00:00+00:00\"}", // explicit zero offset
"{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}", // \u002b is an escaped '+'
}
for _, jsonString := range jsonStrings {
var deserializedStruct AStructWithTime
err := json.Unmarshal([]byte(jsonString), &deserializedStruct)
if err != nil {
t.Fatalf("Could not unmarshal '%s': %v", jsonString, err) // doesn't happen
}
if deserializedStruct.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatal("Unmarshalling is erroneous") // doesn't happen
}
// works; no errors
}
}
Working Example 2: JSON Array into Slice
It also works, if I unmarshal the same objects from a json array into a slice:
func Test_Unmarshalling_DateTime_From_Array(t *testing.T) {
// these are just the same objects as above, just all in one array instead of as single objects/dicts
jsonArrayString := "[{\"foo\": \"2022-09-26T21:00:00Z\"},{\"foo\": \"2022-09-26T21:00:00+00:00\"},{\"foo\": \"2022-09-26T21:00:00\u002b00:00\"}]"
var slice []AStructWithTime // and now I need to unmarshal into a slice
unmarshalErr := json.Unmarshal([]byte(jsonArrayString), &slice)
if unmarshalErr != nil {
t.Fatalf("Could not unmarshal array: %v", unmarshalErr)
}
for index, instance := range slice {
if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatalf("Unmarshalling failed for index %v: Expected %v but got %v", index, expectedStruct.Foo, instance.Foo)
}
}
// works; no errors
}
Not Working Example
Now I do the same unmarshalling with a JSON read from a file "test.json". Its content is the array from the working example above:
[
{
"foo": "2022-09-26T21:00:00Z"
},
{
"foo": "2022-09-26T21:00:00+00:00"
},
{
"foo": "2022-09-26T21:00:00\u002b00:00"
}
]
The code is:
func Test_Unmarshalling_DateTime_From_File(t *testing.T) {
fileName := "test.json"
fileContent, readErr := os.ReadFile(filepath.FromSlash(fileName))
if readErr != nil {
t.Fatalf("Could not read file %s: %v", fileName, readErr)
}
if fileContent == nil {
t.Fatalf("File %s must not be empty", fileName)
}
var slice []AStructWithTime
unmarshalErr := json.Unmarshal(fileContent, &slice)
if unmarshalErr != nil {
// ERROR HAPPENS HERE
// Could not unmarshal file content test.json: parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
t.Fatalf("Could not unmarshal file content %s: %v", fileName, unmarshalErr)
}
for index, instance := range slice {
if instance.Foo.Unix() != expectedStruct.Foo.Unix() {
t.Fatalf("Unmarshalling failed for index %v in file %s. Expected %v but got %v", index, fileName, expectedStruct.Foo, instance.Foo)
}
}
}
It fails because of the escaped '+'.
> parsing time ""2022-09-26T21:00:00\u002b00:00"" as ""2006-01-02T15:04:05Z07:00"": cannot parse "\u002b00:00"" as "Z07:00"
Question: Why does unmarshalling the time.Time field fail when it's being read from a file but works when the same json is read from an identical string?
答案1
得分: 5
我认为这是encoding/json
中的一个错误。
JSON语法在https://www.json.org和JSON的IETF定义RFC 8259,第7节:字符串中都规定了JSON字符串可以包含Unicode转义序列:
7. 字符串
字符串的表示类似于C系列编程语言中使用的约定。字符串以引号开始和结束。除了必须转义的字符(引号、反斜杠和控制字符(U+0000到U+001F))之外,所有Unicode字符都可以放在引号中。
任何字符都可以转义。如果字符在基本多语言平面(U+0000到U+FFFF)中,则可以表示为六个字符序列:反斜杠,后跟小写字母u,后跟编码字符的四个十六进制数字。十六进制字母A到F可以是大写或小写。例如,只包含单个反斜杠字符的字符串可以表示为"\u005C"。
...
要转义不在基本多语言平面中的扩展字符,该字符表示为12个字符序列,编码UTF-16代理对。例如,只包含G谱号字符(U+1D11E)的字符串可以表示为"\uD834\uDD1E"。
string = quotation-mark *char quotation-mark char = unescaped / escape ( %x22 / ; " quotation mark U+0022 %x5C / ; \ reverse solidus U+005C %x2F / ; / solidus U+002F %x62 / ; b backspace U+0008 %x66 / ; f form feed U+000C %x6E / ; n line feed U+000A %x72 / ; r carriage return U+000D %x74 / ; t tab U+0009 %x75 4HEXDIG ) ; uXXXX U+XXXX escape = %x5C ; \ quotation-mark = %x22 ; " unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
原始帖子中的JSON文档
{
"foo": "2022-09-26T21:00:00\u002b00:00"
}
在Node.js中使用JSON.parse()
进行解析和反序列化时完全正常。
以下是演示错误的示例:
package main
import (
"encoding/json"
"fmt"
"time"
)
var document []byte = []byte(`
{
"value": "2022-09-26T21:00:00\u002b00:00"
}
`)
func main() {
deserializeJsonAsTime()
deserializeJsonAsString()
}
func deserializeJsonAsTime() {
fmt.Println("")
fmt.Println("Deserializing JSON as time.Time ...")
type Widget struct {
Value time.Time `json:"value"`
}
expected := Widget{
Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as time.Time")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Success")
}
}
func deserializeJsonAsString() {
fmt.Println("")
fmt.Println("Deserializing JSON as string ...")
type Widget struct {
Value string `json:"value"`
}
expected := Widget{
Value: "2022-09-26T21:00:00+00:00",
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as string")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Success")
}
}
当运行时(参见https://goplay.tools/snippet/fHQQVJ8GfPp),我们得到:
Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
Deserializing JSON as string ...
Success
由于将包含Unicode转义序列的JSON字符串反序列化为string
会产生正确/预期的结果-转义序列被转换为预期的符文/字节序列-问题似乎出现在处理反序列化为time.Time
的代码中(它似乎不会将其反序列化为字符串,然后将字符串值解析为time.Time
)。
英文:
I believe that this is a bug in encoding/json
.
Both the JSON grammar at https://www.json.org and the IETF definition of JSON at RFC 8259, Section 7: Strings provide that a JSON string may contain Unicode escape sequences:
> 7. Strings
>
> The representation of strings is similar to conventions used in the C
> family of programming languages. A string begins and ends with quotation
> marks. All Unicode characters may be placed within the quotation marks,
> except for the characters that MUST be escaped: quotation mark, reverse
> solidus, and the control characters (U+0000 through U+001F).
>
> Any character may be escaped. If the character is in the Basic
> Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a
> six-character sequence: a reverse solidus, followed by the lowercase letter
> u, followed by four hexadecimal digits that encode the character's code
> point. The hexadecimal letters A through F can be uppercase or lowercase.
> So, for example, a string containing only a single reverse solidus
> character may be represented as "\u005C".
>
> . . .
>
> To escape an extended character that is not in the Basic Multilingual
> Plane, the character is represented as a 12-character sequence, encoding
> the UTF-16 surrogate pair. So, for example, a string containing only the
> G-clef character (U+1D11E) may be represented as "\uD834\uDD1E".
>
> lang-none
>
> string = quotation-mark *char quotation-mark
>
> char = unescaped /
> escape (
> %x22 / ; " quotation mark U+0022
> %x5C / ; \ reverse solidus U+005C
> %x2F / ; / solidus U+002F
> %x62 / ; b backspace U+0008
> %x66 / ; f form feed U+000C
> %x6E / ; n line feed U+000A
> %x72 / ; r carriage return U+000D
> %x74 / ; t tab U+0009
> %x75 4HEXDIG ) ; uXXXX U+XXXX
>
> escape = %x5C ; \
>
> quotation-mark = %x22 ; "
>
> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
>
>
The JSON document from the original post
{
"foo": "2022-09-26T21:00:00\u002b00:00"
}
Parses and deserializes perfectly fine in Node.js using JSON.parse()
.
Here's an example demonstrating the bug:
package main
import (
"encoding/json"
"fmt"
"time"
)
var document []byte = []byte(`
{
"value": "2022-09-26T21:00:00\u002b00:00"
}
`)
func main() {
deserializeJsonAsTime()
deserializeJsonAsString()
}
func deserializeJsonAsTime() {
fmt.Println("")
fmt.Println("Deserializing JSON as time.Time ...")
type Widget struct {
Value time.Time `json: "value"`
}
expected := Widget{
Value: time.Date(2022, 9, 26, 21, 0, 0, 0, time.UTC),
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as time.Time")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}
func deserializeJsonAsString() {
fmt.Println("")
fmt.Println("Deserializing JSON as string ...")
type Widget struct {
Value string `json: "value"`
}
expected := Widget{
Value: "2022-09-26T21:00:00+00:00",
}
actual := Widget{}
err := json.Unmarshal(document, &actual)
switch {
case err != nil:
fmt.Println("Error deserializing JSON as string")
fmt.Println(err)
case actual.Value != expected.Value:
fmt.Printf("Unmarshalling failed: expected %v but got %v\n", expected.Value, actual.Value)
default:
fmt.Println("Sucess")
}
}
When run — see https://goplay.tools/snippet/fHQQVJ8GfPp — we get:
Deserializing JSON as time.Time ...
Error deserializing JSON as time.Time
parsing time "\"2022-09-26T21:00:00\\u002b00:00\"" as "\"2006-01-02T15:04:05Z07:00\"": cannot parse "\\u002b00:00\"" as "Z07:00"
Deserializing JSON as string ...
Sucess
Since deserializing a JSON string containing Unicode escape sequences as a string
yields the correct/expected result — the escape sequence being turned into the expected rune/byte sequence — the problem seemingly lies in the code that handles the deserialization to time.Time
(It does not appear to deserialize to a string and then parse the string value as a time.Time
.
答案2
得分: 0
如这个用户指出,这是一个问题:time: UnmarshalJSON does not respect escaped unicode characters。我们可以通过以下方式解决json.Unmarshal
时出现的两个错误,将字符串{"value": "2022-09-26T21:00:00\u002b00:00"}
转换为JSON。
-
JSON在将'+'转义为'\u002b'时失败
- 解决方案:通过
strconv.Unquote
将转义的Unicode转换为UTF-8
- 解决方案:通过
-
无法将"\\u002b00:00\""解析为"Z07:00"
- 解决方案:使用格式
"2006-01-02T15:04:05-07:00"
解析时间stdNumColonTZ // "-07:00"
来自src/time/format.go
- 如果要从中解析出时区,可以使用
time.ParseInLocation
。
- 解决方案:使用格式
为了使其与json.Unmarshal
兼容,我们可以定义一个新类型utf8Time
。
type utf8Time struct {
time.Time
}
func (t *utf8Time) UnmarshalJSON(data []byte) error {
str, err := strconv.Unquote(string(data))
if err != nil {
return err
}
tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
if err != nil {
return err
}
*t = utf8Time{tmpT}
return nil
}
func (t utf8Time) String() string {
return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}
然后进行json.Unmarshal
操作。
type MyDoc struct {
Value utf8Time `json:"value"`
}
var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)
func main() {
var mydoc MyDoc
err := json.Unmarshal(document, &mydoc)
if err != nil {
fmt.Println(err)
}
fmt.Println(mydoc.Value)
}
输出结果为:
2022-09-26 21:00:00 +0000 +0000
英文:
As Brits point out this is one issue time: UnmarshalJSON does not respect escaped unicode characters. We could solve those two errors when json.Unmarshal
to the string {"value": "2022-09-26T21:00:00\u002b00:00"}
in this way.
-
JSON fails when escaping '+' as '\u002b'
- Solution: Converting escaped unicode to utf8 through
strconv.Unquote
- Solution: Converting escaped unicode to utf8 through
-
cannot parse "\\u002b00:00\"" as "Z07:00"
- Solution: parse time with this format
"2006-01-02T15:04:05-07:00"
stdNumColonTZ // "-07:00"
fromsrc/time/format.go
- If you want to parse TimeZone from it,
time.ParseInLocation
could be used.
- Solution: parse time with this format
In order to make it compatible with json.Unmarshal
, we could define one new type utf8Time
type utf8Time struct {
time.Time
}
func (t *utf8Time) UnmarshalJSON(data []byte) error {
str, err := strconv.Unquote(string(data))
if err != nil {
return err
}
tmpT, err := time.Parse("2006-01-02T15:04:05-07:00", str)
if err != nil {
return err
}
*t = utf8Time{tmpT}
return nil
}
func (t utf8Time) String() string {
return t.Format("2006-01-02 15:04:05.999999999 -0700 MST")
}
Then to do the json.Unmarshal
type MyDoc struct {
Value utf8Time `json:"value"`
}
var document = []byte(`{"value": "2022-09-26T21:00:00\u002b00:00"}`)
func main() {
var mydoc MyDoc
err := json.Unmarshal(document, &mydoc)
if err != nil {
fmt.Println(err)
}
fmt.Println(mydoc.Value)
}
Output
2022-09-26 21:00:00 +0000 +0000
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论