如何在golang中强制使用UTF-8编码

huangapple go评论94阅读模式
英文:

How to force UTF-8 encoding in golang

问题

我正在尝试将一个字符串解析为 golang 中的常规 JSON 结构。我无法控制原始字符串,但它可能包含不需要的字符,就像这样:

originalstring := `{"os": "\u001C09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}`
input := []byte(originalstring)
var event JsonStruct
parsingError := json.Unmarshal(input, &event)

如果我尝试将其解析为 golang,我会得到以下错误:

invalid character '\x1c' in string literal

我之前在 Java 中有一种方法可以做到这一点,就像这样:

event = charset.decode(charset.encode(event)).toString();
eventJSON = new JsonObject(event);

有什么想法吗?

英文:

I'm trying to parse a string into a regular JSON struct in golang. I don't control the original string, but it might contain unwanted characters like this

originalstring := `{"os": "\u001C09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}`
input := []byte(originalstring)
var event JsonStruct
parsingError := json.Unmarshal(input, &event)

If I try to parse this into golang I get this error

 invalid character '\x1c' in string literal

I previously had a way to do this in Java by doing this

event = charset.decode(charset.encode(event)).toString();
eventJSON = new JsonObject(event);

Any idea?

答案1

得分: 3

你需要将控制字符转换为Unicode代码点,表示为\xYYYY的形式,其中Y是十六进制数字。一个可工作的示例是:

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"unicode"
)

func convert(input string) string {
	var buf bytes.Buffer
	for _, r := range input {
		if unicode.IsControl(r) {
			fmt.Fprintf(&buf, "\\u%04X", r)
		} else {
			fmt.Fprintf(&buf, "%c", r)
		}
	}
	return buf.String()
}

func main() {    
	input := convert(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`)
	fmt.Println(input)
	js := []byte(input)

	t := struct {
		OS string
	}{}

	err := json.Unmarshal(js, &t)
	fmt.Println("error:", err)
	fmt.Println(t)
}

运行结果为:

{"os": "09:@>A>DB Windows 8.1 \u001A>@?>@0B82=0O"}
error: <nil>
{09:@>A>DB Windows 8.1 >@?>@0B82=0O}
英文:

You need to convert control characters to unicode code points in notation \xYYYY where Y is hexadecimal digit. A working example of that is:

package main

import (
	&quot;bytes&quot;
	&quot;encoding/json&quot;
	&quot;fmt&quot;
	&quot;unicode&quot;
)

func convert(input string) string {
	var buf bytes.Buffer
	for _, r := range input {
		if unicode.IsControl(r) {
			fmt.Fprintf(&amp;buf, &quot;\\u%04X&quot;, r)
		} else {
			fmt.Fprintf(&amp;buf, &quot;%c&quot;, r)
		}
	}
	return buf.String()
}

func main() {    
	input := convert(`{&quot;os&quot;: &quot;09:@&gt;A&gt;DB Windows 8.1 &gt;@?&gt;@0B82=0O&quot;}`)
	fmt.Println(input)
	js := []byte(input)

	t := struct {
		OS string
	}{}

	err := json.Unmarshal(js, &amp;t)
	fmt.Println(&quot;error:&quot;, err)
	fmt.Println(t)
}

Which produces:

{&quot;os&quot;: &quot;09:@&gt;A&gt;DB Windows 8.1 \u001A&gt;@?&gt;@0B82=0O&quot;}
error: &lt;nil&gt;
{09:@&gt;A&gt;DB Windows 8.1 &gt;@?&gt;@0B82=0O}

答案2

得分: 2

根据Ecmascript标准中关于JSON字符串的规定,控制字符必须进行转义才能成为有效的JSON。如果你想保留控制字符,你需要将它们转换为有效的转义字符串;如果你不想保留它们,那么在解析之前你需要将它们移除。

以下是后一种方法的实现代码:

func stripCtlFromUTF8(str string) string {
    return strings.Map(func(r rune) rune {
        if r >= 32 && r != 127 {
            return r
        }
        return -1
    }, str)
}

func main() {

    js := []byte(stripCtlFromUTF8(`{"os": "09:@>A>DB Windows 8.1 >@?>@0B82=0O"}`))

    t := struct {
        OS string
    }{}

    err := json.Unmarshal(js, &t)
    fmt.Println("error:", err)
    fmt.Println(t)
}

在 playground 上查看:http://play.golang.org/p/QRtkS8LF1z

英文:

According to the Ecmascript standard for JSON strings, control characters must be escaped in order to be valid JSON. If you want to preserve your control characters you'll have to turn them into valid escape strings, or if you don't want to preserve them then you'll have to remove them before Unmarshaling.

Here is an implementation of the latter:

func stripCtlFromUTF8(str string) string {
	return strings.Map(func(r rune) rune {
		if r &gt;= 32 &amp;&amp; r != 127 {
			return r
		}
		return -1
	}, str)
}

func main() {

	js := []byte(stripCtlFromUTF8(`{&quot;os&quot;: &quot;09:@&gt;A&gt;DB Windows 8.1 &gt;@?&gt;@0B82=0O&quot;}`))

	t := struct {
		OS string
	}{}

	err := json.Unmarshal(js, &amp;t)
	fmt.Println(&quot;error:&quot;, err)
	fmt.Println(t)
}

On the playground: http://play.golang.org/p/QRtkS8LF1z

huangapple
  • 本文由 发表于 2016年2月19日 06:51:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/35494074.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定