英文:
golang, £ char causing weird  character
问题
我有一个函数,它从一串有效字符中生成一个随机字符串。当它选择了一个 £ 字符时,我偶尔会得到奇怪的结果。
我已经将其复现为以下的最小示例:
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
我期望它返回:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
但实际上它并不返回这个结果,而是产生了:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
^
你从哪里来的?
如果我将原始的 validChars 字符串中的 £ 符号去掉,那个奇怪的 A 就消失了。
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
这将产生:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*
英文:
I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £
I've reproduced it to the following minimal example:
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
I would expect this to return
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
But it doesn't, it produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*
^
where did you come from ?
if I take the £ sign out of the original validChars string, that weird A goes away.
func foo() string {
validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*"
var result strings.Builder
for i := 0; i < len(validChars); i++ {
currChar := validChars[i]
result.WriteString(string(currChar))
}
return result.String()
}
This produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*
答案1
得分: 9
一个string
是[]byte
的类型别名。你对string
的心理模型可能是它由字符片段组成,或者正如我们在Go中称之为的:rune
的片段。
对于validChars
字符串中的许多rune
来说,这是没问题的,因为它们是ASCII字符的一部分,因此可以用UTF-8的一个字节表示。然而,£
这个rune
是由2个字节表示的。
现在,如果我们考虑一个字符串£
,它由1个rune
和2个字节组成。正如我之前提到的,一个字符串实际上只是一个[]byte
。如果我们像你在示例中所做的那样,获取第一个元素,我们只会得到表示£
的两个字节中的第一个字节。当你将其转换回字符串时,它会给你一个意外的rune
。
解决你的问题的方法是首先将字符串validChars
转换为[]rune
。然后,你可以通过索引访问它的单个rune
(而不是字节),这样foo
函数就会按预期工作。你可以在这个playground中看到它的实际效果。
还要注意,len(validChars)
将给出字符串中字节的数量。要获取rune
的数量,请使用utf8.RuneCountInString
。
最后,这里是Rob Pike关于这个主题的一篇博文,你可能会觉得有趣。
英文:
A string
is a type alias for []byte
. Your mental model of a string
is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune
.
For many runes in your validChars
string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £
rune is represented as 2 bytes.
Now if we consider a string £
, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte
. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £
. When you convert it back to a string, it gives you an unexpected rune.
The fix for your problem is to first convert string validChars
to a []rune
. Then, you can access its individual runes (rather than bytes) by index, and foo
will work as expected. You can see it in action in this playground.
Also note that len(validChars)
will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString
instead.
Finally, here's a blog post from Rob Pike on the subject that you may find interesting.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论