2016年1月19日 02:20:55go评论73阅读模式

英文:

How to detect when bytes can't be converted to string in Go?

问题

在Go语言中，将[]byte转换为string时，可能会遇到无法转换为Unicode字符串的无效字节序列。如何检测这种情况呢？

你可以使用utf8.Valid函数来检测一个字节序列是否是有效的UTF-8编码。这个函数接受一个[]byte参数，并返回一个布尔值，指示字节序列是否有效。

以下是一个示例代码：

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	bytes := []byte{0xC3, 0x28} // 无效的字节序列

	if utf8.Valid(bytes) {
		str := string(bytes)
		fmt.Println("转换成功:", str)
	} else {
		fmt.Println("无效的字节序列")
	}
}

在上面的示例中，我们定义了一个包含无效字节序列的[]byte，然后使用utf8.Valid函数检测它是否有效。如果有效，我们将其转换为字符串并打印输出；如果无效，则打印出相应的提示信息。

希望这可以帮助到你！如果你还有其他问题，请随时提问。

英文:

There are invalid byte sequences that can't be converted to Unicode strings. How do I detect that when converting []byte to string in Go?

答案1

得分: 27

你可以使用utf8.Valid函数来测试UTF-8的有效性，就像Tim Cooper提到的那样。

但是！你可能会认为将非UTF-8字节转换为Go的string是不可能的。实际上，"在Go中，字符串实际上是一个只读的字节切片"；它可以包含不是有效UTF-8的字节，你可以打印它们，通过索引访问它们，将它们传递给WriteString方法，甚至可以往返转换为[]byte（比如Write）。

在语言中有两个地方，Go会对string进行UTF-8解码。

当你使用for i, r := range s时，r是一个Unicode代码点，类型为rune的值。
当你进行转换[]rune(s)时，Go会将整个字符串解码为runes。

（请注意，rune是int32的别名，而不是完全不同的类型。）

在这两种情况下，无效的UTF-8会被替换为U+FFFD，这是替换字符，用于此类用途。在规范的"for语句"和"字符串与其他类型之间的转换"部分中有更多信息。这些转换永远不会崩溃，因此只有在与你的应用程序相关时，比如无法接受U+FFFD替换并且需要在错误的编码输入上抛出错误时，才需要主动检查UTF-8的有效性。

由于这种行为已经内置到语言中，你也可以期望库中有相同的行为。U+FFFD是utf8.RuneError，并且在utf8中的函数中返回。

下面是一个示例程序，展示了Go对包含无效UTF-8的[]byte的处理：

package main

import "fmt"

func main() {
    a := []byte{0xff}
    s := string(a)
    fmt.Println(s)
    for _, r := range s {
        fmt.Println(r)
    }
    rs := []rune(s)
    fmt.Println(rs)
}

在不同的环境中，输出可能会有所不同，在Playground中的输出如下：

�
65533
[65533]

英文:

You can, as Tim Cooper noted, test UTF-8 validity with utf8.Valid.

But! You might be thinking that converting non-UTF-8 bytes to a Go string is impossible. In fact, "In Go, a string is in effect a read-only slice of bytes"; it can contain bytes that aren't valid UTF-8 which you can print, access via indexing, pass to WriteString methods, or even round-trip back to a []byte (to Write, say).

There are two places in the language that Go does do UTF-8 decoding of strings for you.

when you do for i, r := range s the r is a Unicode code point as a value of type rune
when you do the conversion []rune(s), Go decodes the whole string to runes.

(Note that rune is an alias for int32, not a completely different type.)

In both these instances invalid UTF-8 is replaced with U+FFFD, the replacement character reserved for uses like this. More is in the spec sections on for statements and conversions between strings and other types. These conversions never crash, so you only need to actively check for UTF-8 validity if it's relevant to your application, like if you can't accept the U+FFFD replacement and need to throw an error on mis-encoded input.

Since that behavior's baked into the language, you can expect it from libraries, too. U+FFFD is utf8.RuneError and returned by functions in utf8.

Here's a sample program showing what Go does with a []byte holding invalid UTF-8:

package main

import &quot;fmt&quot;

func main() {
	a := []byte{0xff}
	s := string(a)
	fmt.Println(s)
	for _, r := range s {
		fmt.Println(r)
	}
	rs := []rune(s)
	fmt.Println(rs)
}

Output will look different in different environments, but in the Playground it looks like

�
65533
[65533]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Go中检测字节无法转换为字符串的情况？

问题

答案1

为什么不能为结构体和它的指针同时定义一个方法？

I have this type of error while iterate arrary in for loop "panic: runtime error: index out of range"

What is the correct way to code in Go?

命名类型和未命名类型

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论