英文:
Convert utf8 string to ISO-8859-1
问题
如何在Golang中将utf8
字符串转换为ISO-8859-1
?
我尝试过搜索,但只能找到相反的转换方法,而且我找到的几个解决方案都不起作用。
我需要将包含特殊丹麦字符的字符串转换,例如æ
,ø
和å
。
ø
=> ø
等等。
英文:
How to convert a utf8
string to ISO-8859-1
in golang
Have tried to search but can only find conversions the other way and the few solutions I found didn't work
I need to convert string with special danish chars like æ
, ø
and å
ø
=> ø
etc.
答案1
得分: 4
请注意,ISO-8859-1
只支持与 Unicode 相比的一小部分字符。如果你确定你的 UTF-8
编码的字符串只包含 ISO-8859-1
支持的字符,你可以使用以下代码。
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Räv"
encoder := charmap.ISO8859_1.NewEncoder()
out, err := encoder.Bytes([]byte(str))
if err != nil {
panic(err)
}
fmt.Printf("%x\n", out)
}
上述代码输出:
52e476
因此,0x52
、0xE4
、0x76
,与 https://en.wikipedia.org/wiki/ISO/IEC_8859-1 中的内容相符。特别要注意的是第二个字符,因为在 UTF-8
中它将被编码为 0xC3
、0xA4
。
如果字符串包含不受支持的字符,例如将 str
改为 "Räv🐱v"
,那么 encoder.Bytes([]byte(str))
将返回一个错误:
panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109
如果你希望接受无法转换的字符丢失,一个简单的解决方案是利用 EncodeRune
,它返回一个布尔值,指示该符文是否在 charmap 的字符集中。
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Räv🐱v"
out := make([]byte, 0)
for _, r := range str {
if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
out = append(out, e)
}
}
fmt.Printf("%x\n", out)
}
上述代码输出:
52e47676
即表情符号已被删除。
英文:
Keep in mind that ISO-8859-1
only supports a tiny subset of characters compared to Unicode. If you know for certain that your UTF-8
encoded string only contains characters covered by ISO-8859-1
, you can use the following code.
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Räv"
encoder := charmap.ISO8859_1.NewEncoder()
out, err := encoder.Bytes([]byte(str))
if err != nil {
panic(err)
}
fmt.Printf("%x\n", out)
}
The above prints:
52e476
So 0x52
, 0xE4
, 0x76
, which looks correct as per https://en.wikipedia.org/wiki/ISO/IEC_8859-1 - in particular the second character is of note, since it would be encoded as 0xC3
, 0xA4
in UTF-8
.
If the string contains characters that aren't supported, e.g. we change str
to be "Räv💩v"
, then an error is going to be returned by encoder.Bytes([]byte(str))
:
panic: encoding: rune not supported by encoding.
goroutine 1 [running]:
main.main()
/Users/nj/Dev/scratch/main.go:15 +0x109
If you wish to address that by accepting loss of unconvertible characters, a simple solution might be to leverage EncodeRune
, which returns a boolean to indicate if the rune is in the charmap's repertoire.
package main
import (
"fmt"
"golang.org/x/text/encoding/charmap"
)
func main() {
str := "Räv💩v"
out := make([]byte, 0)
for _, r := range str {
if e, ok := charmap.ISO8859_1.EncodeRune(r); ok {
out = append(out, e)
}
}
fmt.Printf("%x\n", out)
}
The above prints
52e47676
i.e. the emoji has been stripped.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论