英文:
How to force UTF-8 encoding in golang
问题
我正在尝试将一个字符串解析为 golang 中的常规 JSON 结构。我无法控制原始字符串,但它可能包含不需要的字符,就像这样:
originalstring := `{"os": "\u001C09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}`
input := []byte(originalstring)
var event JsonStruct
parsingError := json.Unmarshal(input, &event)
如果我尝试将其解析为 golang,我会得到以下错误:
invalid character '\x1c' in string literal
我之前在 Java 中有一种方法可以做到这一点,就像这样:
event = charset.decode(charset.encode(event)).toString();
eventJSON = new JsonObject(event);
有什么想法吗?
英文:
I'm trying to parse a string into a regular JSON struct in golang. I don't control the original string, but it might contain unwanted characters like this
originalstring := `{"os": "\u001C09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}`
input := []byte(originalstring)
var event JsonStruct
parsingError := json.Unmarshal(input, &event)
If I try to parse this into golang I get this error
invalid character '\x1c' in string literal
I previously had a way to do this in Java by doing this
event = charset.decode(charset.encode(event)).toString();
eventJSON = new JsonObject(event);
Any idea?
答案1
得分: 3
你需要将控制字符转换为Unicode代码点,表示为\xYYYY
的形式,其中Y是十六进制数字。一个可工作的示例是:
package main
import (
"bytes"
"encoding/json"
"fmt"
"unicode"
)
func convert(input string) string {
var buf bytes.Buffer
for _, r := range input {
if unicode.IsControl(r) {
fmt.Fprintf(&buf, "\\u%04X", r)
} else {
fmt.Fprintf(&buf, "%c", r)
}
}
return buf.String()
}
func main() {
input := convert(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`)
fmt.Println(input)
js := []byte(input)
t := struct {
OS string
}{}
err := json.Unmarshal(js, &t)
fmt.Println("error:", err)
fmt.Println(t)
}
运行结果为:
{"os": "09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}
error: <nil>
{09:@>A>DB Windows 8.1 >@?>@0B82=0O}
英文:
You need to convert control characters to unicode code points in notation \xYYYY
where Y is hexadecimal digit. A working example of that is:
package main
import (
"bytes"
"encoding/json"
"fmt"
"unicode"
)
func convert(input string) string {
var buf bytes.Buffer
for _, r := range input {
if unicode.IsControl(r) {
fmt.Fprintf(&buf, "\\u%04X", r)
} else {
fmt.Fprintf(&buf, "%c", r)
}
}
return buf.String()
}
func main() {
input := convert(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`)
fmt.Println(input)
js := []byte(input)
t := struct {
OS string
}{}
err := json.Unmarshal(js, &t)
fmt.Println("error:", err)
fmt.Println(t)
}
Which produces:
{"os": "09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}
error: <nil>
{09:@>A>DB Windows 8.1 >@?>@0B82=0O}
答案2
得分: 2
根据Ecmascript标准中关于JSON字符串的规定,控制字符必须进行转义才能成为有效的JSON。如果你想保留控制字符,你需要将它们转换为有效的转义字符串;如果你不想保留它们,那么在解析之前你需要将它们移除。
以下是后一种方法的实现代码:
func stripCtlFromUTF8(str string) string {
return strings.Map(func(r rune) rune {
if r >= 32 && r != 127 {
return r
}
return -1
}, str)
}
func main() {
js := []byte(stripCtlFromUTF8(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`))
t := struct {
OS string
}{}
err := json.Unmarshal(js, &t)
fmt.Println("error:", err)
fmt.Println(t)
}
在 playground 上查看:http://play.golang.org/p/QRtkS8LF1z
英文:
According to the Ecmascript standard for JSON strings, control characters must be escaped in order to be valid JSON. If you want to preserve your control characters you'll have to turn them into valid escape strings, or if you don't want to preserve them then you'll have to remove them before Unmarshaling.
Here is an implementation of the latter:
func stripCtlFromUTF8(str string) string {
return strings.Map(func(r rune) rune {
if r >= 32 && r != 127 {
return r
}
return -1
}, str)
}
func main() {
js := []byte(stripCtlFromUTF8(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`))
t := struct {
OS string
}{}
err := json.Unmarshal(js, &t)
fmt.Println("error:", err)
fmt.Println(t)
}
On the playground: http://play.golang.org/p/QRtkS8LF1z
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论