2021年7月23日 19:00:10go评论94阅读模式

英文:

golang, £ char causing weird Â character

问题

我有一个函数，它从一串有效字符中生成一个随机字符串。当它选择了一个 £ 字符时，我偶尔会得到奇怪的结果。

我已经将其复现为以下的最小示例：

func foo() string {
	validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!&#163;$%^&amp;*"
	var result strings.Builder

	for i := 0; i < len(validChars); i++ {

		currChar := validChars[i]
		result.WriteString(string(currChar))
	}
	return result.String()
}

我期望它返回：

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*

但实际上它并不返回这个结果，而是产生了：

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!&#194;&#163;$%^&amp;*
                                                                  ^
                                             你从哪里来的？

如果我将原始的 validChars 字符串中的 £ 符号去掉，那个奇怪的 A 就消失了。

func foo() string {
	validChars := "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&amp;*"
	var result strings.Builder

	for i := 0; i < len(validChars); i++ {

		currChar := validChars[i]
		result.WriteString(string(currChar))
	}
	return result.String()
}

这将产生：

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*

英文:

I have a function that generates a random string from a string of valid characters. I'm occasionally getting weird results when it selects a £

I've reproduced it to the following minimal example:

func foo() string {
	validChars := &quot;abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!&#163;$%^&amp;*&quot;
	var result strings.Builder

	for i := 0; i &lt; len(validChars); i++ {

		currChar := validChars[i]
		result.WriteString(string(currChar))
	}
	return result.String()
}

I would expect this to return

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!£$%^&*

But it doesn't, it produces

abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!&#194;&#163;$%^&amp;*
                                                                  ^
                                             where did you come from ?

if I take the £ sign out of the original validChars string, that weird A goes away.

func foo() string {
	validChars := &quot;abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&amp;*&quot;
	var result strings.Builder

	for i := 0; i &lt; len(validChars); i++ {

		currChar := validChars[i]
		result.WriteString(string(currChar))
	}
	return result.String()
}

This produces
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~@:!$%^&*

答案1

得分: 9

一个string是[]byte的类型别名。你对string的心理模型可能是它由字符片段组成，或者正如我们在Go中称之为的：rune的片段。

对于validChars字符串中的许多rune来说，这是没问题的，因为它们是ASCII字符的一部分，因此可以用UTF-8的一个字节表示。然而，£这个rune是由2个字节表示的。

现在，如果我们考虑一个字符串£，它由1个rune和2个字节组成。正如我之前提到的，一个字符串实际上只是一个[]byte。如果我们像你在示例中所做的那样，获取第一个元素，我们只会得到表示£的两个字节中的第一个字节。当你将其转换回字符串时，它会给你一个意外的rune。

解决你的问题的方法是首先将字符串validChars转换为[]rune。然后，你可以通过索引访问它的单个rune（而不是字节），这样foo函数就会按预期工作。你可以在这个playground中看到它的实际效果。

还要注意，len(validChars)将给出字符串中字节的数量。要获取rune的数量，请使用utf8.RuneCountInString。

最后，这里是Rob Pike关于这个主题的一篇博文，你可能会觉得有趣。

英文:

A string is a type alias for []byte. Your mental model of a string is probably that it consists of a slice of characters - or, as we call it in Go: a slice of rune.

For many runes in your validChars string this is fine, as they are part of the ASCII chars and can therefore be represented in a single byte in UTF-8. However, the £ rune is represented as 2 bytes.

Now if we consider a string £, it consists of 1 rune but 2 bytes. As I've mentioned, a string is really just a []byte. If we grab the first element like you are effectively doing in your sample, we will only get the first of the two bytes that represent £. When you convert it back to a string, it gives you an unexpected rune.

The fix for your problem is to first convert string validChars to a []rune. Then, you can access its individual runes (rather than bytes) by index, and foo will work as expected. You can see it in action in this playground.

Also note that len(validChars) will give you the count of bytes in the string. To get the count of runes, use utf8.RuneCountInString instead.

Finally, here's a blog post from Rob Pike on the subject that you may find interesting.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

golang，£字符导致奇怪的Â字符

问题

答案1

Golang：将结构体作为函数参数传递

测试索引超出范围 golang

使用Go语言每隔n秒从设备读取输入，并每隔m秒发送数据。

了解在路由时出现的Go代理失败情况

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论