英文:
Invalid Unicode code point 0xd83f
问题
我正在尝试将一些Java代码转换为Go。Java代码中有一个字符变量,其值为'\ud83f'
。当我尝试在Go中使用这个值时,它无法编译通过:
package main
func main() {
c := '\ud83f'
println(c)
}
运行结果如下:
$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f
为什么会这样?我还尝试在Python中使用该值创建一个字符串,它也可以正常工作。但是在Go中却无法正常工作,原因是什么呢?
英文:
I'm trying to port some Java to Go. The Java code has a character variable with the value '\ud83f'
. When I try to use this value in Go, it doesn't compile:
package main
func main() {
c := '\ud83f'
println(c)
}
<!
$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f
Why? I also tried making a string with that value in Python and it worked too. It's just not working in Go for some reason.
答案1
得分: 5
你尝试使用的那个符文文字是无效的,因为它表示了一个代理码点。规范中指出,符文文字不能表示代理码点(以及其他哪些码点?):
[...]
转义符\u和\U表示Unicode码点,因此在其中一些值是非法的,特别是那些大于0x10FFFF和代理半个。
在下面的示例中,你可以看到另一个被认为是非法的情况:
'\U00110000' // 非法:无效的Unicode码点
这似乎意味着在符文文字中,无效的码点(如大于10FFFF的码点)也是非法的。
请注意,由于rune
只是int32
的别名,你可以简单地使用以下方式:
var r rune = 0xd8f3
而不是
var r rune = '\ud8f3'
如果你想要一个大于10FFFF的数,你可以使用以下方式:
var r rune = 0x11ffff
而不是
var r rune = '\U0011ffff'
英文:
That rune literal you tried to use is invalid because it denotes a surrogate code point. The spec says rune literals cannot denote a surrogate code point ("as well as others" (which?)):
> Rune Literals
>
> [...]
>
> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves.
Further below in the examples, you can see another case which is deemed illegal:
> '\U00110000' // illegal: invalid Unicode code point
Which seems to imply that invalid code points (such as those above 10ffff) are also illegal in rune literals.
Note that since rune
is merely an alias for int32
, you can simply do:
var r rune = 0xd8f3
instead of
var r rune = '\ud8f3'
And if you wanted to get a number above 10FFFF you could do
var r rune = 0x11ffff
instead of
var r rune = '\U0011ffff'
答案2
得分: 3
已经提到过,\ud83f
是UTF-16编码中使用的代理半部分之一。
这不被视为有效的码点,而且**Go规范**明确指出:
> 转义符\u和\U表示Unicode码点,因此其中一些值是非法的,特别是那些大于0x10FFFF和代理半部分的值。
如果你想要一个具有这个无效码点的符文,你可以这样做:
c := rune(0xd83f)
但是,处理这样的值的正确方法是首先解码两个代理半部分,然后使用得到的_有效_码点。
英文:
Already being mentioned, \ud83f
is part of a surrogate half, used in UTF-16 encoding.
This is not considered a valid code point, and the Go specification explicitly states:
> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves.
If you want a rune with this invalid code point, you can do the following:
c := rune(0xd83f)
But, the correct way to handling such a value is to first decode the two surrogate halves, then using the resulting valid code point.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论