2022年10月22日 18:12:48go评论153阅读模式

英文:

Convert utf8 string to ISO-8859-1

问题

如何在Golang中将utf8字符串转换为ISO-8859-1？

我尝试过搜索，但只能找到相反的转换方法，而且我找到的几个解决方案都不起作用。

我需要将包含特殊丹麦字符的字符串转换，例如æ，ø和å。

Ã¸ => ø
等等。

英文:

How to convert a utf8 string to ISO-8859-1 in golang

Have tried to search but can only find conversions the other way and the few solutions I found didn't work

I need to convert string with special danish chars like æ, ø and å

Ã¸ => ø
etc.

答案1

得分: 4

请注意，ISO-8859-1 只支持与 Unicode 相比的一小部分字符。如果你确定你的 UTF-8 编码的字符串只包含 ISO-8859-1 支持的字符，你可以使用以下代码。

package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv"
	encoder := charmap.ISO8859_1.NewEncoder()
	out, err := encoder.Bytes([]byte(str))
	if err != nil {
		panic(err)
	}
	fmt.Printf("%x\n", out)
}

上述代码输出：

52e476

因此，0x52、0xE4、0x76，与 https://en.wikipedia.org/wiki/ISO/IEC_8859-1 中的内容相符。特别要注意的是第二个字符，因为在 UTF-8 中它将被编码为 0xC3、0xA4。

如果字符串包含不受支持的字符，例如将 str 改为 "Räv🐱v"，那么 encoder.Bytes([]byte(str)) 将返回一个错误：

panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109

如果你希望接受无法转换的字符丢失，一个简单的解决方案是利用 EncodeRune，它返回一个布尔值，指示该符文是否在 charmap 的字符集中。

package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv🐱v"
	out := make([]byte, 0)
	for _, r := range str {
		if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
			out = append(out, e)
		}
	}
	fmt.Printf("%x\n", out)
}

上述代码输出：

52e47676

即表情符号已被删除。

英文:

Keep in mind that ISO-8859-1 only supports a tiny subset of characters compared to Unicode. If you know for certain that your UTF-8 encoded string only contains characters covered by ISO-8859-1, you can use the following code.

package main
import (
	&quot;fmt&quot;
	&quot;golang.org/x/text/encoding/charmap&quot;
)
func main() {
	str := &quot;R&#228;v&quot;
	encoder := charmap.ISO8859_1.NewEncoder()
	out, err := encoder.Bytes([]byte(str))
	if err != nil {
		panic(err)
	}
	fmt.Printf(&quot;%x\n&quot;, out)
}

The above prints:

52e476

So 0x52, 0xE4, 0x76, which looks correct as per https://en.wikipedia.org/wiki/ISO/IEC_8859-1 - in particular the second character is of note, since it would be encoded as 0xC3, 0xA4 in UTF-8.

If the string contains characters that aren't supported, e.g. we change str to be "Räv💩v", then an error is going to be returned by encoder.Bytes([]byte(str)):

panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109

If you wish to address that by accepting loss of unconvertible characters, a simple solution might be to leverage EncodeRune, which returns a boolean to indicate if the rune is in the charmap's repertoire.

package main
import (
	&quot;fmt&quot;
	&quot;golang.org/x/text/encoding/charmap&quot;
)
func main() {
	str := &quot;R&#228;v&#128169;v&quot;
	out := make([]byte, 0)
	for _, r := range str {
		if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
			out = append(out, e)
		}
	}
	fmt.Printf(&quot;%x\n&quot;, out)
}

The above prints

52e47676

i.e. the emoji has been stripped.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将UTF-8字符串转换为ISO-8859-1编码。

问题

答案1

Go编译错误：“未定义的函数”

从映射和结构体中编组的JSON中的排序差异

读取字节数组直到换行符 – golang

如何在golang中创建一个自动调用的结构体方法？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。