将 “=?UTF 8?..”(RFC 2047)转换为 golang 中的普通字符串

huangapple go评论110阅读模式

Converting "=?UTF 8?.." (RFC 2047) to a regular string in golang



=?UTF 8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF 8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF 8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF 8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=




I'm using an API and it's returning something like this for other language text:

=?UTF 8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF 8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF 8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF 8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=

Is this a common format? How would I go about converting this to a regular string in golang?

Golang usually handles multiple languages well, but I'm not sure about how to go about converting.


得分: 9

自Go 1.5版本以来,您可以使用mime.WordDecoder.DecodeHeader

  1. package main
  2. import (
  3. "fmt"
  4. "mime"
  5. )
  6. func main() {
  7. dec := new(mime.WordDecoder)
  8. header, err := dec.DecodeHeader("=?UTF-8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF-8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF-8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF-8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=")
  9. if err != nil {
  10. panic(err)
  11. }
  12. fmt.Println(header)
  13. // Output: لخطوات التي تجمع بين حفظ القرآن الكريم وفهمه مما أملاه العلامة عبد الله الغديان.pdf
  14. }



Since Go 1.5 you can use mime.WordDecoder.DecodeHeader:

  1. package main
  2. import (
  3. "fmt"
  4. "mime"
  5. )
  6. func main() {
  7. dec := new(mime.WordDecoder)
  8. header, err := dec.DecodeHeader("=?UTF-8?B?2KfZhNiu2LfZiNin2Kog2KfZhNiq2Yog2KrYrNmF2Lkg2KjZitmG?= =?UTF-8?B?INit2YHYuCDYp9mE2YLYsdin2ZPZhiDYp9mE2YPYsdmK2YUg2YjZgQ==?= =?UTF-8?B?2YfZhdmHINmF2YXYpyDYp9mU2YXZhNin2Ycg2KfZhNi52YTYp9mF?= =?UTF-8?B?2Kkg2LnYqNivINin2YTZhNmHINin2YTYutiv2YrYp9mGLnBkZg==?=")
  9. if err != nil {
  10. panic(err)
  11. }
  12. fmt.Println(header)
  13. // Output: لخطوات التي تجمع بين حفظ القرآن الكريم وفهمه مما أملاه العلامة عبد الله الغديان.pdf
  14. }

If you are using an older version of Go, you can use my replacement library: https://github.com/alexcesaro/quotedprintable


得分: 8

显然,你的API返回的数据是以RFC 2047格式编码的。基本上,它定义了以下内容:

  1. encoded-word = "=?charset?encoding?encoded-text?="


  1. base64.StdEncoding.DecodeString(text)


在Go标准库的net/mail包中有一个decodeRFC2047Word()函数,支持编码方式BQ,字符集UTF-8US-ASCIIISO-8859-1。不幸的是,它没有被导出,但你可以根据需要从中获取灵感 将 “=?UTF 8?..”(RFC 2047)转换为 golang 中的普通字符串

顺便说一句:我刚刚注意到你示例字符串中的字符集是UTF 8,这有点奇怪,因为官方名称是UTF-8


Aparrently your API is returning data encoded in RFC 2047 format. Basically, this defines the following:

  1. encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

Which means your charset is UTF-8 (very handy, since this is Go's native character set), and your encoding is Base64. The text you have to decode is the one between the "B?" and the "?=". So all you have to do is take that text and call:

  1. base64.StdEncoding.DecodeString(text)

to get the original UTF-8 string.

There is a decodeRFC2047Word() function in the net/mail package of the Go stdlib, supporting encodings B and Q and charsets UTF-8, US-ASCII and ISO-8859-1. Unfortunately it's not exported, but you're free to take as much inspiration from it as you need 将 “=?UTF 8?..”(RFC 2047)转换为 golang 中的普通字符串

BTW: I just noticed the charset in your example strings is UTF 8, which is a bit odd, since the official name of the encoding is UTF-8.

  • 本文由 发表于 2015年3月9日 05:30:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/28932140.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
