替代Unicode替换的选项为\u。

huangapple go评论121阅读模式
英文:

option for \u instead of Unicode replacement

问题

如果我运行这段Go代码:

package main

import (
   "encoding/json"
   "os"
)

func main() {
   json.NewEncoder(os.Stdout).Encode("\xa1") // "\ufffd"
}

我会丢失数据,因为一旦进行了Unicode替换,我就无法恢复原始值。与此相比,看看这段Python代码:

import json

a = '\xa1'
b = json.dumps(a) # "\u00a1"
print(json.loads(b) == a) # True

没有进行替换,所以没有丢失数据。此外,生成的JSON仍然是有效的。Go语言是否有一种方法可以使用转义而不是替换来编码JSON字符串?

英文:

If I run this Go code:

package main

import (
   "encoding/json"
   "os"
)

func main() {
   json.NewEncoder(os.Stdout).Encode("\xa1") // "\ufffd"
}

I lose data, since once the Unicode replacement is done, I can no longer get
back the original value. Compare with this Python code:

import json

a = '\xa1'
b = json.dumps(a) # "\u00a1"
print(json.loads(b) == a) # True

no replacement is done, so no data is lost. In addition, the resultant JSON is
still valid. Does Go have some method to encode JSON string with escaping
instead of replacement?

答案1

得分: 1

这个例子是一个错误的等价关系。'\xa1' 是 Python 中的一个有效的 Unicode 字符串,它只是一种可能的表示方式,就像 '\u00a1''\U000000a1'chr(0xa1)'\N{INVERTED EXCLAMATION MARK}''\u00a1' 等等。

在 Python 代码中的等价表示为:

>>> print(json.dumps(b'\xa1'.decode(errors='replace')))
"\ufffd"

这也是在标准输出中打印出被强制转换为 REPLACEMENT CHARACTER 的 ASCII 表示,与 Go 中的情况相同。

英文:

This example is a false equivalence. The '\xa1' is a valid Unicode string in Python, it's just one possible representation like '\u00a1' or '\U000000a1' or chr(0xa1) or '\N{INVERTED EXCLAMATION MARK}' or '¡' or ...

The equivalent in Python code would be:

>>> print(json.dumps(b'\xa1'.decode(errors='replace')))
"\ufffd"

Which is also printing an ascii representation of the coerced REPLACEMENT CHARACTER on stdout, the same as in Go.

答案2

得分: 0

这是因为"\xa1"不是一个有效的Unicode字符串。它包含了字节0xa1,这是无效的(单独来说是无效的)。无效的字节会被替换为U+FFFD,这是“替代字符”,用于当输入无效时。

如果你想编码Unicode字符U+00A1,可以写成"\u00a1"。如果你想让任意数据通过JSON往返传输,你需要以不同的方式表示它(比如使用base64编码)。

Python的处理方式不同——在Python中,\xa1转义序列表示的是U+00A1。而在Go语言中,\xa1表示的是字节0xa1,它本身不是一个有效的Unicode字符串,不能被编码为JSON字符串。

英文:

This is because "\xa1" is not a valid Unicode string. It contains the byte 0xa1, which is not valid (not valid by itself). The not valid byte gets replaced with U+FFFD, which is the “replacement character”—used when the input is invalid.

If you want to encode the Unicode character U+00A1, write it as "\u00a1". If you want to make arbitrary data go round-trip through JSON, you will have to represent it a different way (like base64 encoding it, for example).

Python just works differently—in Python, the \xa1 escape sequence is U+00A1. Again, in Go, \xa1 is the byte 0xa1, which is not a valid Unicode string by itself and cannot be encoded as a JSON string.

huangapple
  • 本文由 发表于 2022年11月29日 09:40:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/74608264.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定