无效的Unicode代码点0xd83f

huangapple go评论77阅读模式
英文:

Invalid Unicode code point 0xd83f

问题

我正在尝试将一些Java代码转换为Go。Java代码中有一个字符变量,其值为'\ud83f'。当我尝试在Go中使用这个值时,它无法编译通过:

package main
func main() {
    c := '\ud83f'
    println(c)
}

运行结果如下:

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

为什么会这样?我还尝试在Python中使用该值创建一个字符串,它也可以正常工作。但是在Go中却无法正常工作,原因是什么呢?

英文:

I'm trying to port some Java to Go. The Java code has a character variable with the value '\ud83f'. When I try to use this value in Go, it doesn't compile:

package main
func main() {
    c := '\ud83f'
    println(c)
}

<!

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

Why? I also tried making a string with that value in Python and it worked too. It's just not working in Go for some reason.

答案1

得分: 5

你尝试使用的那个符文文字是无效的,因为它表示了一个代理码点。规范中指出,符文文字不能表示代理码点(以及其他哪些码点?):

符文文字

[...]

转义符\u和\U表示Unicode码点,因此在其中一些值是非法的,特别是那些大于0x10FFFF和代理半个。

在下面的示例中,你可以看到另一个被认为是非法的情况:

'\U00110000' // 非法:无效的Unicode码点

这似乎意味着在符文文字中,无效的码点(如大于10FFFF的码点)也是非法的。

请注意,由于rune只是int32的别名,你可以简单地使用以下方式:

var r rune = 0xd8f3

而不是

var r rune = '\ud8f3'

如果你想要一个大于10FFFF的数,你可以使用以下方式:

var r rune = 0x11ffff

而不是

var r rune = '\U0011ffff'
英文:

That rune literal you tried to use is invalid because it denotes a surrogate code point. The spec says rune literals cannot denote a surrogate code point ("as well as others" (which?)):

> Rune Literals
>
> [...]
>
> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves.

Further below in the examples, you can see another case which is deemed illegal:

> '\U00110000' // illegal: invalid Unicode code point

Which seems to imply that invalid code points (such as those above 10ffff) are also illegal in rune literals.

Note that since rune is merely an alias for int32, you can simply do:

var r rune = 0xd8f3

instead of

var r rune = &#39;\ud8f3&#39;

And if you wanted to get a number above 10FFFF you could do

var r rune = 0x11ffff

instead of

var r rune = &#39;\U0011ffff&#39;

答案2

得分: 3

已经提到过,\ud83f是UTF-16编码中使用的代理半部分之一。

这不被视为有效的码点,而且**Go规范**明确指出:

> 转义符\u和\U表示Unicode码点,因此其中一些值是非法的,特别是那些大于0x10FFFF和代理半部分的值

如果你想要一个具有这个无效码点的符文,你可以这样做:

c := rune(0xd83f)

但是,处理这样的值的正确方法是首先解码两个代理半部分,然后使用得到的_有效_码点。

英文:

Already being mentioned, \ud83f is part of a surrogate half, used in UTF-16 encoding.
This is not considered a valid code point, and the Go specification explicitly states:

> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves
.

If you want a rune with this invalid code point, you can do the following:

c := rune(0xd83f)

But, the correct way to handling such a value is to first decode the two surrogate halves, then using the resulting valid code point.

huangapple
  • 本文由 发表于 2014年8月29日 04:18:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/25557314.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定