英文:
Convert unicode code point to literal character in Go
问题
让我们假设我有一个像这样的文本文件。
\u0053
\u0075
\u006E
有没有办法将其转换为这样?
S
u
n
目前,我正在使用ioutil.ReadFile("data.txt")
,但是当我打印数据时,我得到的是Unicode代码点,而不是字符串文字。我意识到这是ReadFile
的正确行为,但这不是我想要的。
我希望将代码点替换为它们的文字字符。
英文:
Let's say I have a text file like this.
\u0053
\u0075
\u006E
Is there a way I can convert that to this?
S
u
n
Currently, I'm using ioutil.ReadFile("data.txt")
, but when I print the data, I get the unicode code points instead of the string literals. I realize this is the correct behavior for ReadFile
, it's just not want I want.
I'm aiming for a substitution of the code points with their literal characters.
答案1
得分: 7
你可以使用strconv.Unquote()
和strconv.UnquoteChar()
函数进行转换。
需要注意的一点是,strconv.Unquote()
只能解析带引号的字符串(例如以引号字符"
或反引号字符`
开头和结尾),所以我们需要手动添加引号。
看看这个例子:
lines := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(lines)
for i, v := range lines {
var err error
lines[i], err = strconv.Unquote(`"` + v + `"`)
if err != nil {
fmt.Println(err)
}
}
fmt.Println(lines)
fmt.Println(strconv.Unquote(`"Go\u0070\x68\x65\x72"`))
输出结果(在Go Playground上尝试):
[\u0053 \u0075 \u006E]
[S u n]
Gopher <nil>
如果你想解析的字符串包含单个rune
的转义序列(或者只想解析第一个rune
),你可以使用strconv.UnquoteChar()
。示例如下(注意:在这种情况下不需要对输入进行引号处理,就像对strconv.Unquote()
所需的那样):
runes := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(runes)
for _, v := range runes {
var err error
value, _, _, err := strconv.UnquoteChar(v, 0)
if err != nil {
fmt.Println(err)
}
fmt.Printf("%c\n", value)
}
输出结果(在Go Playground上尝试):
[\u0053 \u0075 \u006E]
S
u
n
英文:
You can use the strconv.Unquote()
and strconv.UnquoteChar()
functions to do the conversion.
One thing you should be aware of is that strconv.Unquote()
can only unquote strings that are in quotes (e.g. start and end with a quote char "
or a back quote char `
), so we have to manually append that.
See this example:
lines := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(lines)
for i, v := range lines {
var err error
lines[i], err = strconv.Unquote(`"` + v + `"`)
if err != nil {
fmt.Println(err)
}
}
fmt.Println(lines)
fmt.Println(strconv.Unquote(`"Go\u0070\x68\x65\x72"`))
Output (try it on the Go Playground):
[\u0053 \u0075 \u006E]
[S u n]
Gopher <nil>
If the strings you want to unquote contain the escape sequence of a single rune
(or you just want to unquote the first rune
), you may use strconv.UnquoteChar()
. This is how it looks like (note: no quoting of the input is needed in this case, like it was needed for strconv.Unquote()
):
runes := []string{
`\u0053`,
`\u0075`,
`\u006E`,
}
fmt.Println(runes)
for _, v := range runes {
var err error
value, _, _, err := strconv.UnquoteChar(v, 0)
if err != nil {
fmt.Println(err)
}
fmt.Printf("%c\n", value)
}
This will output (try it on the Go Playground):
[\u0053 \u0075 \u006E]
S
u
n
答案2
得分: 3
稍微不同的方法是使用strconv.ParseInt
,这样可以生成更少的垃圾并且使用更少的内部逻辑(Unquote
执行了很多其他检查)来解析行:
for i, v := range lines {
if len(v) != 6 {
continue
}
if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
lines[i] = string(r)
}
}
英文:
A slightly different approach is using strconv.ParseInt
, this generates less garbage and uses less internal logic (Unquote
does a lot of other checks) for parsing the lines:
for i, v := range lines {
if len(v) != 6 {
continue
}
if r, err := strconv.ParseInt(v[2:], 16, 32); err == nil {
lines[i] = string(r)
}
}
答案3
得分: 1
你可以使用以下代码:
import "github.com/chzyer/readline/runes"
// unicodeUnquote将unicode点(如\u0053)转换为UTF8编码。
func unicodeUnquote(bs []byte) []byte {
unicodeEscapeRx := regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)
return unicodeEscapeRx.ReplaceAllFunc(bs, func(code []byte) []byte {
rune, _, _, _ := strconv.UnquoteChar(string(code), 0)
width := runes.Width(rune)
runeBytes := make([]byte, width)
utf8.EncodeRune(runeBytes, rune)
return runeBytes
})
}
完整示例可在https://go.dev/play/p/ElIGZvJNyEF中找到。
英文:
You can use this:
import "github.com/chzyer/readline/runes"
// unicodeUnquote converts unicode points such as \u0053 to UTF8 encoding.
func unicodeUnquote(bs []byte) []byte {
unicodeEscapeRx := regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)
return unicodeEscapeRx.ReplaceAllFunc(bs, func(code []byte) []byte {
rune, _, _, _ := strconv.UnquoteChar(string(code), 0)
width := runes.Width(rune)
runeBytes := make([]byte, width)
utf8.EncodeRune(runeBytes, rune)
return runeBytes
})
}
A full example is at https://go.dev/play/p/ElIGZvJNyEF.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论