英文:
Decode Marshalled JSON unicode
问题
我认为用一个示例是解释我的问题最快的方式:
package main
import (
"fmt"
"encoding/json"
)
type JSON struct {
Body string
}
func main() {
body := "<html><body>Hello World</body></html>"
obj := JSON{body}
result, _ := json.Marshal(obj)
fmt.Println(string(result))
}
输出:
{"Body":"\u003chtml\u003e\u003cbody\u003eHello World\u003c/body\u003e\u003c/html\u003e"}
我希望结果是一个与输入相同的UTF-8编码字符串。我该如何实现这一点?我尝试在循环中使用utf8.DecodeRune
:
str := ""
for _, res := range result {
decoded, _ := utf8.DecodeRune(res)
str += string(decoded)
}
但是这导致了一个编译错误:
main.go:21: cannot use res (type byte) as type []byte in argument to utf8.DecodeRune
并且在编组对象上调用DecodeRune
返回了第一个字符,正如你所期望的那样:
{
编辑:我使用的是Go 1.6.2,显然没有SetEscapeHTML
,原因不明。
英文:
I think the quickest way to explain my problem is with an example:
package main
import (
"fmt"
"encoding/json"
)
type JSON struct {
Body string
}
func main() {
body := "<html><body>Hello World</body></html>"
obj := JSON{body}
result, _ := json.Marshal(obj)
fmt.Println(string(result))
}
Output:
> {"Body":"\u003chtml\u003e\u003cbody\u003eHello World\u003c/body\u003e\u003c/html\u003e"}
I'd like the result to be a utf8-encoded string that reads the same as it went in. How can I achieve this? I tried to use utf8.DecodeRune,
in a loop:
str := ""
for _, res := range result {
decoded, _ := utf8.DecodeRune(res)
str += string(decoded)
}
but that causes a compilation error
> main.go:21: cannot use res (type byte) as type []byte in argument to utf8.DecodeRune
And calling DecodeRune
on the marshalled object returns the first character, as you'd expect
> {
Edit: I'm using Go 1.6.2, which apparently doesn't have SetEscapeHTML
for whatever reason.
答案1
得分: 10
这是预期的行为。根据文档:
> 字符串值被编码为 JSON 字符串,强制转换为有效的 UTF-8,将无效的字节替换为 Unicode 替换符。角括号“<”和“>”被转义为“\u003c”和“\u003e”,以防止某些浏览器将 JSON 输出错误地解释为 HTML。同样出于这个原因,和号“&”也被转义为“\u0026”。可以使用调用了 SetEscapeHTML(false) 的编码器来禁用这种转义。
您可以通过使用 Encoder
并在其上调用 SetEscapeHTML(false)
来获得所需的结果:
func main() {
body := "<html><body>Hello World</body></html>"
obj := JSON{body}
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
enc.Encode(obj)
}
工作示例:https://play.golang.org/p/lMNCJ16dIo
英文:
This is intended behavior. From the docs:
> String values encode as JSON strings coerced to valid UTF-8, replacing
> invalid bytes with the Unicode replacement rune. The angle brackets
> "<" and ">" are escaped to "\u003c" and "\u003e" to keep some browsers
> from misinterpreting JSON output as HTML. Ampersand "&" is also
> escaped to "\u0026" for the same reason. This escaping can be disabled
> using an Encoder that had SetEscapeHTML(false) called on it.
You can get the required result by using an Encoder
and calling SetEscapeHTML(false)
on it:
func main() {
body := "<html><body>Hello World</body></html>"
obj := JSON{body}
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
enc.Encode(obj)
}
Working example: https://play.golang.org/p/lMNCJ16dIo
答案2
得分: 3
另一种解决方法是将这些转义字符替换为未转义的UTF-8字符。(我曾经这样做是为了使非英文字母在JSON中可读。)
你可以使用strconv.Quote()
和strconv.Unquote()
来进行转换。
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// ''\u263a'' == ''☺''
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
希望对你有所帮助。
英文:
Another solution to achieve this is to simply replace those escaped characters into unescaped UTF-8 characters. (I used to do this to make non-English letters to be human readable in JSON.)
You can use the strconv.Quote()
and strconv.Unquote()
to do the conversion.
func _UnescapeUnicodeCharactersInJSON(_jsonRaw json.RawMessage) (json.RawMessage, error) {
str, err := strconv.Unquote(strings.Replace(strconv.Quote(string(_jsonRaw)), `\\u`, `\u`, -1))
if err != nil {
return nil, err
}
return []byte(str), nil
}
func main() {
// Both are valid JSON.
var jsonRawEscaped json.RawMessage // json raw with escaped unicode chars
var jsonRawUnescaped json.RawMessage // json raw with unescaped unicode chars
// '\u263a' == '☺'
jsonRawEscaped = []byte(`{"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}`) // "\\u263a"
jsonRawUnescaped, _ = _UnescapeUnicodeCharactersInJSON(jsonRawEscaped) // "☺"
fmt.Println(string(jsonRawEscaped)) // {"HelloWorld": "\uC548\uB155, \uC138\uC0C1(\u4E16\u4E0A). \u263a"}
fmt.Println(string(jsonRawUnescaped)) // {"HelloWorld": "안녕, 세상(世上). ☺"}
}
https://play.golang.org/p/pUsrzrrcDG-
I hope this helps.
答案3
得分: 0
顺便提一下,这是编译器错误的原因。
json.Marshal 返回的是一个字节切片([]byte
),而不是字符串。
当你使用 range
遍历一个字节切片时,你遍历的不是它的符文,而是逐个字节。你不能在一个字节值上使用 DecodeRune()
,它期望的是一个符文,即一个表示 Unicode 代码点的 32 位整数值。如果你使用 range
在一个字符串上进行遍历,你会得到这样的结果。
现在,根据你想要实现的目标,看起来你根本不需要使用 DecodeRune。
另一个答案已经很好地描述了如何告诉 JSON 编码器不要转义 <
和 >
字符,即:
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
英文:
By the way, here's the reason for the compiler error.
json.Marshal returns a byte slice ([]byte
), not a string.
When you iterate over a byte slice using range
, you are not iterating over its runes but over single bytes at a time. You can't use DecodeRune()
on a byte value - it expects a rune, which is a 32-bit integer value representing a Unicode code point. This is what you'd get if you iterate using range
on a string.
Now, from what you are wanting to achieve, it doesn't look like you want DecodeRune at all.
The other answer adequately describes how to tell the JSON encode not to escape <
and >
characters ie
enc := json.NewEncoder(os.Stdout)
enc.SetEscapeHTML(false)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论