英文:
Decoding quoted-printable email in Golang
问题
当您在Gmail中的HTML电子邮件中连续输入两个空格时,它会将其编码为quoted-printable正文中的"=C2=A0 ",如果您查看电子邮件的源代码。
根据这个stackoverflow的答案,由于UTF-8编码,解码时应该将其转换为00A0(nbsp):https://stackoverflow.com/a/2774507
然而,在Golang中,情况并非如此:
s := `Text Text Text.=C2=A0 That's just two spaces`
r := strings.NewReader(s)
qpReader := quotedprintable.NewReader(r)
all, _ := ioutil.ReadAll(qpReader)
str := string(all)
fmt.Println(strings.Index(str, "\xC2\xA0"))
这将输出"15",这是Playground链接:https://play.golang.org/p/8n6L7dlZPt
在那里,它不会使用NBSP,而是保留\xC2并导致"Text Text Text That's just two spaces"。
正确呈现为\x00A0的最佳方法是什么?
英文:
When you type a two spaces in a row in an html email in Gmail it encodes it into the quoted-printable body as "=C2=A0 " if you look at the source of the email.
According to this stackoverflow answer, because of the UTF-8 encoding this should be converted to 00A0 (nbsp) when decoded: https://stackoverflow.com/a/2774507
However, in Golang, this isn't how it works:
s := `Text Text Text.=C2=A0 That's just two spaces`
r := strings.NewReader(s)
qpReader := quotedprintable.NewReader(r)
all, _ := ioutil.ReadAll(qpReader)
str := string(all)
fmt.Println(strings.Index(str, "\xC2\xA0"))
This outputs "15", here's the Playground link: https://play.golang.org/p/8n6L7dlZPt
Instead of it using an NBSP there, it will keep the \xC2 and result in "Text Text Text That's just two spaces".
What's the best way to correctly render this as \x00A0?
答案1
得分: 0
如Volker在他的评论中解释的那样,Go语言中的字符串实际上是字节切片。在你的情况下,它已经被编码为UTF-8,这是Go语言的默认编码方式。要访问实际的Unicode代码点(在Go语言中称为rune),可以使用以下代码:
// 输出 15。
fmt.Println(strings.IndexRune(str, '\xA0'))
// 输出 A0。
fmt.Printf("%X\n", []rune(str)[15]);
如何正确地渲染字符串取决于你想要在哪里进行渲染。但在大多数情况下,你可以直接传递它,因为它已经是UTF-8编码的。
英文:
As Volker explained in his comment, a Go string is simply a slice of bytes. In your case, it's already encoded as UTF-8 which is Go's default encoding. To access the actual Unicode code points (runes in Go lingo), use something like:
// Prints 15.
fmt.Println(strings.IndexRune(str, '\xA0'))
// Prints A0.
fmt.Printf("%X\n", []rune(str)[15]);
How to correctly render the string depends on where you want to render it. But in most cases, you can pass it as is since it's already in UTF-8.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论