2021年9月26日 23:52:06go评论88阅读模式

英文:

Is non-English string still 'a read-only slice of bytes'?

问题

在https://go.dev/blog/strings中提到：

在Go中，字符串实际上是一个只读的字节切片。

据我理解，byte数据类型在Go中等同于表示ASCII字符的uint8，它可以完美地处理只包含英文字母的字符串。

对于非英语字符串，比如日语、韩语、中文、阿拉伯语等，是否仍然可以说“在Go中，字符串实际上是一个只读的字节切片”？

或者我可以说“在Go中，非英语字符串实际上是一个只读的*Rune*切片”，因为ASCII不支持包含日语、韩语、中文、阿拉伯语字符的字符串，这些字符必须使用Unicode或UTF-8用Rune表示。

英文:

It's mentioned in https://go.dev/blog/strings that:

In Go, a string is in effect a read-only slice of bytes.

To my understanding, byte data type is equivalent to uint8 in Go that represents the ASCII characters which works perfectly with strings that consists of English letters only.

For non-English string, such as Japanese, Korean, Chinese, Arabic etc, is it still correct to say "In Go, a string is in effect a read-only slice of bytes."?

Or can I say "In Go, a non-English string is in effect a read-only slice of Rune" because apparently ASCII does not support the strings with Japanese, Korean, Chinese, Arabic characters which must be represented in Unicode or UTF-8 using Rune.

答案1

得分: 4

据我理解，字节数据类型在Go中相当于uint8，表示ASCII字符，它与仅包含英文字母的字符串完美配合。

不，字节并不意味着ASCII。Go在任何情况下都不使用ASCII。

Go中的字符串通常是UTF-8编码的。标准库中的字符串函数都使用UTF-8编码。使用range将字符串作为一系列符文访问时，假设字符串是UTF-8编码的。UTF-8是Unicode到字节的编码方式。无论你使用哪种语言，这些都是正确的。

字符串也可以包含不是UTF-8的数据；正如你引用的文章所说，字符串基本上只是一个不可变的[]byte，可以包含任何字节序列，包括二进制数据和其他编码的字符数据。这是完全有效的；只是对这些“字符串”使用strings函数或range没有意义。这些类型只是捕捉了可变和不可变之间的区别；它们没有捕捉“字符字符串”和“一堆字节”之间的区别。

英文:

> To my understanding, byte data type is equivalent to uint8 in Go that represents the ASCII characters which works perfectly with strings that consists of English letters only.

No. Byte doesn't mean ASCII. Go doesn't use ASCII for anything.

Strings in Go are normally UTF-8. The string functions in the standard library all work with UTF-8. Accessing a string as a series of runes using range assumes that the string is UTF-8. UTF-8 is an encoding of Unicode into bytes. All of this is true regardless of what language you're working with.

Strings can also contain data that isn't UTF-8; as the article you quoted said, a string is basically just an immutable []byte, and can contain any sequence of bytes, including binary data, and character data in other encodings than UTF-8. This is perfectly valid; it just doesn't make sense to use strings functions or range on these "strings". The types really only capture the difference between mutable and immutable; they don't capture the difference between "a character string" and "a bunch of bytes".

答案2

得分: 3

是的，无论字符集如何，string都将是一个字节切片。例如：

s := "селёдка"
fmt.Printf("%d\n", len(s))

即使这个单词只有7个字母，上述代码将打印出14。这意味着你不能使用s[2]来获取第三个字符。

然而，当你在字符串上进行迭代时，你会得到符文（rune）：

s := "селёдка"
for _, c := range s {
    fmt.Printf("%s\n", c)
}

上述代码将逐个打印出单词的每个字母。

如果你想直接处理符文（rune），可以将字符串转换为切片：

r := []rune(s)

英文:

Yes, string will be a slice of bytes regardless of the charset. For example:

s := &quot;селёдка&quot;
fmt.Printf(&quot;%d\n&quot;, len(s))

will print 14 even though the word is 7 letters long. That means, you cannot e.g. use s[2] to get the third characters.

However, when you're iterating over a string, you are getting runes:

s := &quot;селёдка&quot;
for _, c := range s {
    fmt.Printf(&quot;%s\n&quot;, c)
}

will print the word letter by letter.

If you want to deal with the runes directly, convert the string to the slice:

r := []rune(s)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

非英文字符串仍然是“只读的字节切片”吗？

问题

答案1

答案2

Custom JSON Marshaller Supporting Base64 encoding | error calling MarshalJSON for type routes.Temp: invalid character 'e'

GoLang fmt.Scan类型错误跳过下一个fmt.Scan。

Golang rpc获取错误的结构变量

为什么这个程序会产生输出

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论