英文:
Converting "=?UTF 8?.." (RFC 2047) to a regular string in golang
问题
我正在使用一个API,它返回的是其他语言文本的类似格式:
=?UTF 8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF 8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF 8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF 8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=
这是一种常见的格式吗?我该如何在Golang中将其转换为普通字符串?
Golang通常很好地处理多种语言,但我不确定如何进行转换。
英文:
I'm using an API and it's returning something like this for other language text:
=?UTF 8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF 8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF 8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF 8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=
Is this a common format? How would I go about converting this to a regular string in golang?
Golang usually handles multiple languages well, but I'm not sure about how to go about converting.
答案1
得分: 9
自Go 1.5版本以来,您可以使用mime.WordDecoder.DecodeHeader:
package main
import (
"fmt"
"mime"
)
func main() {
dec := new(mime.WordDecoder)
header, err := dec.DecodeHeader("=?UTF-8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF-8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF-8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF-8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=")
if err != nil {
panic(err)
}
fmt.Println(header)
// Output: لخطوات التي تجمع بين حفظ القرآن الكريم وفهمه مما أملاه العلامة عبد الله الغديان.pdf
}
如果您使用的是较旧版本的Go,您可以使用我的替代库:https://github.com/alexcesaro/quotedprintable
英文:
Since Go 1.5 you can use mime.WordDecoder.DecodeHeader:
package main
import (
"fmt"
"mime"
)
func main() {
dec := new(mime.WordDecoder)
header, err := dec.DecodeHeader("=?UTF-8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF-8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF-8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF-8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=")
if err != nil {
panic(err)
}
fmt.Println(header)
// Output: لخطوات التي تجمع بين حفظ القرآن الكريم وفهمه مما أملاه العلامة عبد الله الغديان.pdf
}
If you are using an older version of Go, you can use my replacement library: https://github.com/alexcesaro/quotedprintable
答案2
得分: 8
显然,你的API返回的数据是以RFC 2047格式编码的。基本上,它定义了以下内容:
encoded-word = "=?charset?encoding?encoded-text?="
这意味着你的字符集是UTF-8(非常方便,因为这是Go的本地字符集),而你的编码是Base64。你需要解码的文本位于"B?"和"?="之间。所以你只需要取出那段文本并调用:
base64.StdEncoding.DecodeString(text)
就可以得到原始的UTF-8字符串。
在Go标准库的net/mail
包中有一个decodeRFC2047Word()
函数,支持编码方式B
和Q
,字符集UTF-8
、US-ASCII
和ISO-8859-1
。不幸的是,它没有被导出,但你可以根据需要从中获取灵感
顺便说一句:我刚刚注意到你示例字符串中的字符集是UTF 8
,这有点奇怪,因为官方名称是UTF-8
。
英文:
Aparrently your API is returning data encoded in RFC 2047 format. Basically, this defines the following:
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
Which means your charset is UTF-8 (very handy, since this is Go's native character set), and your encoding is Base64. The text you have to decode is the one between the "B?" and the "?=". So all you have to do is take that text and call:
base64.StdEncoding.DecodeString(text)
to get the original UTF-8 string.
There is a decodeRFC2047Word()
function in the net/mail
package of the Go stdlib, supporting encodings B
and Q
and charsets UTF-8
, US-ASCII
and ISO-8859-1
. Unfortunately it's not exported, but you're free to take as much inspiration from it as you need
BTW: I just noticed the charset in your example strings is UTF 8
, which is a bit odd, since the official name of the encoding is UTF-8
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论