英文:
golang convert byte array containing unicode
问题
以下是翻译好的内容:
type MyStruct struct {
Value json.RawMessage `json:"value"`
}
var resp *http.Response
if resp, err = http.DefaultClient.Do(req); err == nil {
if resp.StatusCode == 200 {
var buffer []byte
if buffer, err = ioutil.ReadAll(resp.Body); err == nil {
mystruct = &MyStruct{}
err = json.Unmarshal(buffer, mystruct)
}
}
}
fmt.Println(string(mystruct.Value))
它产生的结果类似于:
<head>
</head>
<body>
在这里查看文档:http://golang.org/pkg/encoding/json/#Unmarshal
文档中写道:
在解组带引号的字符串时,无效的 UTF-8 或无效的 UTF-16 代理对不会被视为错误。相反,它们会被 Unicode 替换字符 U+FFFD 替换。
我有点认为这就是问题所在。只是由于我对 Go 的经验有限,而且我很累,所以看不出答案。
英文:
type MyStruct struct {
Value json.RawMessage `json:"value"`
}
var resp *http.Response
if resp, err = http.DefaultClient.Do(req); err == nil {
if resp.StatusCode == 200 {
var buffer []byte
if buffer, err = ioutil.ReadAll(resp.Body); err == nil {
mystruct = &MyStruct{}
err = json.Unmarshal(buffer, mystruct)
}
}
}
fmt.Println(string(mystruct.Value))
it produces something like:
\u003Chead>\n \u003C/head>\n \u003Cbody>
Doc at: http://golang.org/pkg/encoding/json/#Unmarshal
says:
When unmarshaling quoted strings, invalid UTF-8 or invalid UTF-16 surrogate pairs are not treated as an error. Instead, they are replaced by the Unicode replacement character U+FFFD.
I kinda think this is what is going on. I just can't see the answer as my experience with go is minimal and I'm tired.
答案1
得分: 6
有一种方法可以将json.RawMessage
中的转义Unicode字符转换为有效的UTF8字符,而无需解组它。(我不得不处理这个问题,因为我的母语是韩语。)
你可以使用strconv.Quote()
和strconv.Unquote()
来进行转换。
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// ''\u263a'' == ''☺''
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
希望这可以帮到你:D
英文:
There is a way to convert escaped unicode characters in json.RawMessage
into just valid UTF8 characters without unmarshalling it. (I had to deal with the issue since my primary language is Korean.)
You can use the strconv.Quote()
and strconv.Unquote()
to do the conversion.
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// '\u263a' == '☺'
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
https://play.golang.org/p/pUsrzrrcDG-
Hope this helps
答案2
得分: 3
你决定使用json.RawMessage
来防止解析json消息中键为value
的值。
字符串字面量"\u003chtml\u003e"
是"<html>"
的有效json表示。
由于你告诉json.Unmarshal
不要解析这部分内容,它不会解析它并将其原样返回给你。
如果你想将其解析为UTF-8字符串,那么将MyStruct
的定义更改为:
type MyStruct struct {
Value string `json:"value"`
}
英文:
You decided to use json.RawMessage
to prevent parsing of the value with key value
in your json message.
The string literal "\u003chtml\u003e"
is a valid json representation of "<html>"
.
Since you told json.Unmarshal
not to parse this part, it does not parse it and returns it to you as-is.
If you want to have it parsed into an UTF-8 string, then change the definition of MyStruct
to:
type MyStruct struct {
Value string `json:"value"`
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论