2015年12月7日 12:59:35go评论189阅读模式

英文:

Convert unicode code point to literal character in Go

问题

让我们假设我有一个像这样的文本文件。

\u0053
\u0075
\u006E

有没有办法将其转换为这样？

S
u
n

目前，我正在使用ioutil.ReadFile("data.txt")，但是当我打印数据时，我得到的是Unicode代码点，而不是字符串文字。我意识到这是ReadFile的正确行为，但这不是我想要的。

我希望将代码点替换为它们的文字字符。

英文:

Let's say I have a text file like this.

\u0053
\u0075
\u006E

Is there a way I can convert that to this?

S
u
n

Currently, I'm using ioutil.ReadFile("data.txt"), but when I print the data, I get the unicode code points instead of the string literals. I realize this is the correct behavior for ReadFile, it's just not want I want.

I'm aiming for a substitution of the code points with their literal characters.

答案1

得分: 7

你可以使用strconv.Unquote()和strconv.UnquoteChar()函数进行转换。

需要注意的一点是，strconv.Unquote()只能解析带引号的字符串（例如以引号字符"或反引号字符`开头和结尾），所以我们需要手动添加引号。

看看这个例子：

lines := []string{
    `\u0053`,
    `\u0075`,
    `\u006E`,
}
fmt.Println(lines)

for i, v := range lines {
    var err error
    lines[i], err = strconv.Unquote(`&quot;` + v + `&quot;`)
    if err != nil {
        fmt.Println(err)
    }
}
fmt.Println(lines)

fmt.Println(strconv.Unquote(`&quot;Go\u0070\x68\x65\x72&quot;`))

输出结果（在Go Playground上尝试）：

[\u0053 \u0075 \u006E]
[S u n]
Gopher &lt;nil&gt;

如果你想解析的字符串包含单个rune的转义序列（或者只想解析第一个rune），你可以使用strconv.UnquoteChar()。示例如下（注意：在这种情况下不需要对输入进行引号处理，就像对strconv.Unquote()所需的那样）：

runes := []string{
    `\u0053`,
    `\u0075`,
    `\u006E`,
}
fmt.Println(runes)

for _, v := range runes {
    var err error
    value, _, _, err := strconv.UnquoteChar(v, 0)
    if err != nil {
        fmt.Println(err)
    }
    fmt.Printf("%c\n", value)
}

输出结果（在Go Playground上尝试）：

[\u0053 \u0075 \u006E]
S
u
n

英文:

You can use the strconv.Unquote() and strconv.UnquoteChar() functions to do the conversion.

One thing you should be aware of is that strconv.Unquote() can only unquote strings that are in quotes (e.g. start and end with a quote char " or a back quote char `), so we have to manually append that.

See this example:

lines := []string{
	`\u0053`,
	`\u0075`,
	`\u006E`,
}
fmt.Println(lines)

for i, v := range lines {
	var err error
	lines[i], err = strconv.Unquote(`&quot;` + v + `&quot;`)
	if err != nil {
		fmt.Println(err)
	}
}
fmt.Println(lines)

fmt.Println(strconv.Unquote(`&quot;Go\u0070\x68\x65\x72&quot;`))

Output (try it on the Go Playground):

[\u0053 \u0075 \u006E]
[S u n]
Gopher &lt;nil&gt;

If the strings you want to unquote contain the escape sequence of a single rune (or you just want to unquote the first rune), you may use strconv.UnquoteChar(). This is how it looks like (note: no quoting of the input is needed in this case, like it was needed for strconv.Unquote()):

runes := []string{
	`\u0053`,
	`\u0075`,
	`\u006E`,
}
fmt.Println(runes)

for _, v := range runes {
	var err error
	value, _, _, err := strconv.UnquoteChar(v, 0)
	if err != nil {
		fmt.Println(err)
	}
	fmt.Printf(&quot;%c\n&quot;, value)
}

This will output (try it on the Go Playground):

[\u0053 \u0075 \u006E]
S
u
n

答案2

得分: 3

稍微不同的方法是使用strconv.ParseInt，这样可以生成更少的垃圾并且使用更少的内部逻辑（Unquote执行了很多其他检查）来解析行：

for i, v := range lines {
    if len(v) != 6 {
        continue
    }

    if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
        lines[i] = string(r)
    }
}

playground

英文:

A slightly different approach is using strconv.ParseInt, this generates less garbage and uses less internal logic (Unquote does a lot of other checks) for parsing the lines:

for i, v := range lines {
	if len(v) != 6 {
		continue
	}

	if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
		lines[i] = string(r)
	}
}

<kbd>playground</kbd>

答案3

得分: 1

你可以使用以下代码：

import "github.com/chzyer/readline/runes"

// unicodeUnquote将unicode点（如\u0053）转换为UTF8编码。
func unicodeUnquote(bs []byte) []byte {
	unicodeEscapeRx := regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)
	return unicodeEscapeRx.ReplaceAllFunc(bs, func(code []byte) []byte {
		rune, _, _, _ := strconv.UnquoteChar(string(code), 0)
		width := runes.Width(rune)
		runeBytes := make([]byte, width)
		utf8.EncodeRune(runeBytes, rune)
		return runeBytes
	})
}

完整示例可在https://go.dev/play/p/ElIGZvJNyEF中找到。

英文:

You can use this:

import &quot;github.com/chzyer/readline/runes&quot;

// unicodeUnquote converts unicode points such as \u0053 to UTF8 encoding.
func unicodeUnquote(bs []byte) []byte {
	unicodeEscapeRx := regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)
	return unicodeEscapeRx.ReplaceAllFunc(bs, func(code []byte) []byte {
		rune, _, _, _ := strconv.UnquoteChar(string(code), 0)
		width := runes.Width(rune)
		runeBytes := make([]byte, width)
		utf8.EncodeRune(runeBytes, rune)
		return runeBytes
	})
}

A full example is at https://go.dev/play/p/ElIGZvJNyEF.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Convert unicode code point to literal character in Go

问题

答案1

答案2

答案3

解析带有绑定的数组表单元素

为什么我们不能在const中声明一个map，并在之后填充它呢？

Go中的数字类型推断

Gorm：无法添加或更新子行 – 自引用的外键约束失败

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论