英文:
Golang Decoding/Unmarshaling invalid unicode in JSON
问题
我正在使用Go语言获取格式不统一的JSON文件。
例如,我可能会得到以下内容:
{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m33ead"}
我们可以看到转义字符会导致问题。
使用json.Decode
:
对于:
{"name": "m33ead"}
我会得到错误:invalid character '3' in string escape code
我尝试了几种方法来规范化我的数据,例如通过传递一个字符串数组(它可以工作,但有太多的边界情况),甚至过滤转义字符。
最后,我看到了这篇文章:(http://blog.golang.org/normalization)
他们提出的解决方案似乎非常有趣。
我尝试了以下代码:
isMn := func(r rune) bool {
return unicode.Is(unicode.Mn, r)
}
t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)
fileReader, err := bucket.GetReader(filename)
transformReader := transform.NewReader(fileReader, t)
decoder := json.NewDecoder(tReader)
for {
var dataModel Model
if err := decoder.Decode(&kmData); err == io.EOF {
break
} else {
// 做一些操作
}
}
其中Model
定义为:
type Model struct {
Name string `json:"name" bson:"name"`
Email string `json:"email" bson:"email"`
}
我尝试了几种变化,但都无法使其正常工作。
所以我的问题是如何轻松处理具有不同编码的JSON数据的解码/反序列化?请注意,我无法控制这些JSON文件。
如果你正在阅读这篇文章,无论如何谢谢你。
英文:
I am fetching JSON files in go that are not formatted homogeneously.
For Example, I can have the following:
{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m33ead"}
We can see that there will be a problem with the escaping character.
Using json.Decode
:
With:
{"name": "m33ead"}
I get the error: invalid character '3' in string escape code
I have tried several approaches to normalise my data for example by passing by a string array (it works but there is too many edge cases), or even to filter escape characters.
Finally, I came through this article: (http://blog.golang.org/normalization)
And the solution they proposed seemed very interesting.
I have tried the following
isMn := func(r rune) bool {
return unicode.Is(unicode.Mn, r)
}
t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)
fileReader, err := bucket.GetReader(filename)
transformReader := transform.NewReader(fileReader, t)
decoder := json.NewDecoder(tReader)
for {
var dataModel Model
if err := decoder.Decode(&kmData); err == io.EOF {
break
} else {
// DO SOMETHING
}
}
With Model
being:
type Model struct {
Name string `json:"name" bson:"name"`
Email string `json:"email" bson:"email"`
}
I have tried several variations of it, but haven't been able to have it working.
So my question is how to easily handle decoding/unmarshaling JSON data with different encodings? Knowing, that I have no control on those JSON files.
If you are reading this, thank you anyway.
答案1
得分: 4
你可以使用json.RawMessage
代替string
,这样json.Decode
就不会尝试解码无效字符了。
playground链接:http://play.golang.org/p/fB-38KGAO0
type Model struct {
N json.RawMessage `json:"name" bson:"name"`
}
func (m *Model) Name() string {
return string(m.N)
}
func main() {
s := "{\"name\": \"m\3\3ead\"}"
r := strings.NewReader(s)
d := json.NewDecoder(r)
m := Model{}
fmt.Println(d.Decode(&m))
fmt.Println(m.Name())
}
编辑:嗯,你可以使用正则表达式,不确定对你来说是否可行,这是链接:http://play.golang.org/p/VYJKTKmiYm
func cleanUp(s string) string {
re := regexp.MustCompile(`\b(\\\d\d\d)`)
return re.ReplaceAllStringFunc(s, func(s string) string {
return `\u0` + s[1:]
})
}
func main() {
s := "{\"name\": \"m\3\3ead\"}"
s = cleanUp(s)
r := strings.NewReader(s)
d := json.NewDecoder(r)
m := Model{}
fmt.Println(d.Decode(&m))
fmt.Println(m.Name())
}
英文:
You can use json.RawMessage
instead of string
, that way json.Decode
won't try to decode the invalid characters.
playground : http://play.golang.org/p/fB-38KGAO0
type Model struct {
N json.RawMessage `json:"name" bson:"name"`
}
func (m *Model) Name() string {
return string(m.N)
}
func main() {
s := "{\"name\": \"m33ead\"}"
r := strings.NewReader(s)
d := json.NewDecoder(r)
m := Model{}
fmt.Println(d.Decode(&m))
fmt.Println(m.Name())
}
Edit: Well, you can use regex, not sure how viable that is for you http://play.golang.org/p/VYJKTKmiYm:
func cleanUp(s string) string {
re := regexp.MustCompile(`\b(\\\d\d\d)`)
return re.ReplaceAllStringFunc(s, func(s string) string {
return `\u0` + s[1:]
})
}
func main() {
s := "{\"name\": \"m33ead\"}"
s = cleanUp(s)
r := strings.NewReader(s)
d := json.NewDecoder(r)
m := Model{}
fmt.Println(d.Decode(&m))
fmt.Println(m.Name())
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论