2014年8月29日 04:18:56go评论122阅读模式

英文:

Invalid Unicode code point 0xd83f

问题

我正在尝试将一些Java代码转换为Go。Java代码中有一个字符变量，其值为'\ud83f'。当我尝试在Go中使用这个值时，它无法编译通过：

package main
func main() {
    c := '\ud83f'
    println(c)
}

运行结果如下：

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

为什么会这样？我还尝试在Python中使用该值创建一个字符串，它也可以正常工作。但是在Go中却无法正常工作，原因是什么呢？

英文:

I'm trying to port some Java to Go. The Java code has a character variable with the value '\ud83f'. When I try to use this value in Go, it doesn't compile:

package main
func main() {
    c := &#39;\ud83f&#39;
    println(c)
}

$ go run a.go
# command-line-arguments
./a.go:3: invalid Unicode code point in escape sequence: 0xd83f

Why? I also tried making a string with that value in Python and it worked too. It's just not working in Go for some reason.

答案1

得分: 5

你尝试使用的那个符文文字是无效的，因为它表示了一个代理码点。规范中指出，符文文字不能表示代理码点（以及其他哪些码点？）：

符文文字

[...]

转义符\u和\U表示Unicode码点，因此在其中一些值是非法的，特别是那些大于0x10FFFF和代理半个。

在下面的示例中，你可以看到另一个被认为是非法的情况：

'\U00110000' // 非法：无效的Unicode码点

这似乎意味着在符文文字中，无效的码点（如大于10FFFF的码点）也是非法的。

请注意，由于rune只是int32的别名，你可以简单地使用以下方式：

var r rune = 0xd8f3

而不是

var r rune = '\ud8f3'

如果你想要一个大于10FFFF的数，你可以使用以下方式：

var r rune = 0x11ffff

而不是

var r rune = '\U0011ffff'

英文:

That rune literal you tried to use is invalid because it denotes a surrogate code point. The spec says rune literals cannot denote a surrogate code point ("as well as others" (which?)):

> Rune Literals
>
> [...]
>
> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves.

Further below in the examples, you can see another case which is deemed illegal:

> '\U00110000' // illegal: invalid Unicode code point

Which seems to imply that invalid code points (such as those above 10ffff) are also illegal in rune literals.

Note that since rune is merely an alias for int32, you can simply do:

var r rune = 0xd8f3

instead of

var r rune = &#39;\ud8f3&#39;

And if you wanted to get a number above 10FFFF you could do

var r rune = 0x11ffff

instead of

var r rune = &#39;\U0011ffff&#39;

答案2

得分: 3

已经提到过，\ud83f是UTF-16编码中使用的代理半部分之一。

这不被视为有效的码点，而且**Go规范**明确指出：

> 转义符\u和\U表示Unicode码点，因此其中一些值是非法的，特别是那些大于0x10FFFF和代理半部分的值。

如果你想要一个具有这个无效码点的符文，你可以这样做：

c := rune(0xd83f)

但是，处理这样的值的正确方法是首先解码两个代理半部分，然后使用得到的_有效_码点。

英文:

Already being mentioned, \ud83f is part of a surrogate half, used in UTF-16 encoding.
This is not considered a valid code point, and the Go specification explicitly states:

> The escapes \u and \U represent Unicode code points so within them
> some values are illegal, in particular those above 0x10FFFF and
> surrogate halves.

If you want a rune with this invalid code point, you can do the following:

c := rune(0xd83f)

But, the correct way to handling such a value is to first decode the two surrogate halves, then using the resulting valid code point.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

无效的Unicode代码点0xd83f

问题

答案1

答案2

将图像URL上传到S3并进行缓冲处理

在函数参数中不使用*structobject将结构体按引用传递

Go – storing structs with the same embedded struct in a list

从SQL结果创建用于API的JSON。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。