英文:
Convert utf8 string to ISO-8859-1
问题
如何在Golang中将utf8字符串转换为ISO-8859-1?
我尝试过搜索,但只能找到相反的转换方法,而且我找到的几个解决方案都不起作用。
我需要将包含特殊丹麦字符的字符串转换,例如æ,ø和å。
ø => ø
等等。
英文:
How to convert a utf8 string to ISO-8859-1 in golang
Have tried to search but can only find conversions the other way and the few solutions I found didn't work
I need to convert string with special danish chars like æ, ø and å
ø => ø
etc.
答案1
得分: 4
请注意,ISO-8859-1 只支持与 Unicode 相比的一小部分字符。如果你确定你的 UTF-8 编码的字符串只包含 ISO-8859-1 支持的字符,你可以使用以下代码。
package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv"
	encoder := charmap.ISO8859_1.NewEncoder()
	out, err := encoder.Bytes([]byte(str))
	if err != nil {
		panic(err)
	}
	fmt.Printf("%x\n", out)
}
上述代码输出:
52e476
因此,0x52、0xE4、0x76,与 https://en.wikipedia.org/wiki/ISO/IEC_8859-1 中的内容相符。特别要注意的是第二个字符,因为在 UTF-8 中它将被编码为 0xC3、0xA4。
如果字符串包含不受支持的字符,例如将 str 改为 "Räv🐱v",那么 encoder.Bytes([]byte(str)) 将返回一个错误:
panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109
如果你希望接受无法转换的字符丢失,一个简单的解决方案是利用 EncodeRune,它返回一个布尔值,指示该符文是否在 charmap 的字符集中。
package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv🐱v"
	out := make([]byte, 0)
	for _, r := range str {
		if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
			out = append(out, e)
		}
	}
	fmt.Printf("%x\n", out)
}
上述代码输出:
52e47676
即表情符号已被删除。
英文:
Keep in mind that ISO-8859-1 only supports a tiny subset of characters compared to Unicode. If you know for certain that your UTF-8 encoded string only contains characters covered by ISO-8859-1, you can use the following code.
package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv"
	encoder := charmap.ISO8859_1.NewEncoder()
	out, err := encoder.Bytes([]byte(str))
	if err != nil {
		panic(err)
	}
	fmt.Printf("%x\n", out)
}
The above prints:
52e476
So 0x52, 0xE4, 0x76, which looks correct as per https://en.wikipedia.org/wiki/ISO/IEC_8859-1 - in particular the second character is of note, since it would be encoded as 0xC3, 0xA4 in UTF-8.
If the string contains characters that aren't supported, e.g. we change str to be "Räv💩v", then an error is going to be returned by encoder.Bytes([]byte(str)):
panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109
If you wish to address that by accepting loss of unconvertible characters, a simple solution might be to leverage EncodeRune, which returns a boolean to indicate if the rune is in the charmap's repertoire.
package main
import (
	"fmt"
	"golang.org/x/text/encoding/charmap"
)
func main() {
	str := "Räv💩v"
	out := make([]byte, 0)
	for _, r := range str {
		if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
			out = append(out, e)
		}
	}
	fmt.Printf("%x\n", out)
}
The above prints
52e47676
i.e. the emoji has been stripped.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论