将Unicode转换为GSM编码在Golang中的实现

huangapple go评论122阅读模式
英文:

Converting unicode to gsm encoding in golang

问题

我正在将我的Python项目迁移到Go语言,并且有一个使用情况,需要将UTF-8编码转换为对应的GSM编码(如果可能的话)。我对Go语言非常陌生,如果能提供一些相关的文档或示例代码将非常有帮助。

例如:Python代码片段

ằ作为Unicode -> 在GSM编码后变为a

  1. for character in text:
  2. if is_gsm(character):
  3. transliterated_text += character.encode('utf-8')
  4. continue
  5. if is_nonascii_utf8(character):
  6. transliterated_char = unidecode.unidecode(character)
  7. if transliterated_char == '?' or transliterated_char == '':
  8. gsm = False
  9. break
  10. if transliterated_char != rc:
  11. character = transliterated_char
  12. transliterated_text += character
  13. else:
  14. transliterated_text += character.encode('utf-8')
  15. if gsm and is_gsm(transliterated_text.decode('utf-8')):
  16. text = transliterated_text.decode('utf-8')

谢谢。

英文:

I am working on migrating my project in python to golang and I have a use case for converting utf-8 encoding to corresponding gsm ones if possible. I am very new to go, it will be really helpful to get some documentation or examples around it.

For example: Python snippet

ằ as unicode -> a after gsm encoding

  1. for character in text:
  2. if is_gsm(character):
  3. transliterated_text += character.encode('utf-8')
  4. continue
  5. if is_nonascii_utf8(character):
  6. transliterated_char = unidecode.unidecode(character)
  7. if transliterated_char == '?' or transliterated_char == '':
  8. gsm = False
  9. break
  10. if transliterated_char != rc:
  11. character = transliterated_char
  12. transliterated_text += character
  13. else:
  14. transliterated_text += character.encode('utf-8')
  15. if gsm and is_gsm(transliterated_text.decode('utf-8')):
  16. text = transliterated_text.decode('utf-8')

Thanks

答案1

得分: 2

你可以按照以下方式进行操作:

  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. "strings"
  6. )
  7. var utf8GsmChars = map[string]string{
  8. `@`: "\x00", `£`: "\x01", `$`: "\x02",
  9. `¥`: "\x03", `è`: "\x04", `é`: "\x05",
  10. `ù`: "\x06", `ì`: "\x07", `ò`: "\x08",
  11. `Ç`: "\x09", `Ø`: "\x0B", `ø`: "\x0C",
  12. `Å`: "\x0E", `Δ`: "\x10", `_`: "\x11",
  13. `Φ`: "\x12", `Γ`: "\x13", `Λ`: "\x14",
  14. `Ω`: "\x15", `Π`: "\x16", `Ψ`: "\x17",
  15. `Σ`: "\x18", `Θ`: "\x19", `Ξ`: "\x1A",
  16. `Æ`: "\x1C", `æ`: "\x1D", `ß`: "\x1E",
  17. `É`: "\x1F", `Ä`: "\x5B", `Ö`: "\x5C",
  18. `Ñ`: "\x5D", `Ü`: "\x5E", `§`: "\x5F",
  19. `¿`: "\x60", `ä`: "\x7B", `ö`: "\x7C",
  20. `ñ`: "\x7D", `ü`: "\x7E", `à`: "\x7F",
  21. `^`: "\x1B\x14`, `{`: "\x1B\x28",
  22. `}`: "\x1B\x29`, `\`: "\x1B\x2F",
  23. `[`: "\x1B\x3C`, `~`: "\x1B\x3D",
  24. `]`: "\x1B\x3E`, `|`: "\x1B\x40",
  25. ``: "\x1B\x65",
  26. }
  27. var gsmUtf8Chars = map[string]string{
  28. "\x00": "\x40",
  29. "\x01": "\xC2\xA3",
  30. "\x02": "\x24",
  31. "\x03": "\xC2\xA5",
  32. "\x04": "\xC3\xA8",
  33. "\x05": "\xC3\xA9",
  34. "\x06": "\xC3\xB9",
  35. "\x07": "\xC3\xAC",
  36. "\x08": "\xC3\xB2",
  37. "\x09": "\xC3\x87",
  38. "\x0B": "\xC3\x98",
  39. "\x0C": "\xC3\xB8",
  40. "\x0E": "\xC3\xB8",
  41. "\x0F": "\xC3\xA5",
  42. "\x10": "\xCE\x94",
  43. "\x11": "\x5F",
  44. "\x12": "\xCE\xA6",
  45. "\x13": "\xCE\x93",
  46. "\x14": "\xCE\xA0",
  47. "\x15": "\xCE\xA9",
  48. "\x16": "\xCE\xA0",
  49. "\x17": "\xCE\xA8",
  50. "\x18": "\xCE\xA3",
  51. "\x19": "\xCE\x98",
  52. "\x1A": "\xCE\x9E",
  53. "\x1C": "\xC3\x86",
  54. "\x1D": "\xC3\xA6",
  55. "\x1E": "\xC3\x9F",
  56. "\x1F": "\xC3\x89",
  57. "\x20": "\x20",
  58. "\x24": "\xC2\xA4",
  59. "\x40": "\xC2\xA1",
  60. "\x5B": "\xC3\x84",
  61. "\x5C": "\xC3\x96",
  62. "\x5D": "\xC3\x91",
  63. "\x5E": "\xC3\x9C",
  64. "\x5F": "\xC2\xA7",
  65. "\x60": "\xC2\xBF",
  66. "\x7B": "\xC3\xA8",
  67. "\x7C": "\xC3\xB6",
  68. "\x7D": "\xC3\xB1",
  69. "\x7E": "\xC3\xBC",
  70. "\x7F": "\xC3\xA0",
  71. }
  72. func UTF8ToGsm0338(text string) string {
  73. var s string = text
  74. for k, v := range utf8GsmChars {
  75. s = strings.Replace(s, k, v, -1)
  76. }
  77. re := regexp.MustCompile("[\\x{0080}-\\x{10FFFF}]")
  78. s = re.ReplaceAllString(s, "?")
  79. return s
  80. }
  81. func GSM0338ToUTF8(text string) string {
  82. var s string = text
  83. for k, v := range gsmUtf8Chars {
  84. s = strings.Replace(s, k, v, -1)
  85. }
  86. return s
  87. }
  88. func main() {
  89. s := "Hello World"
  90. gsm := UTF8ToGsm0338(s)
  91. utf8 := GSM0338ToUTF8(gsm)
  92. fmt.Printf("word before: %s\nword after gsm: %s\nword after utf8: %s\n", s, gsm, utf8)
  93. }
英文:

You can do it in this way:

  1. package main
  2. import (
  3. "fmt"
  4. "regexp"
  5. "strings"
  6. )
  7. var utf8GsmChars = map[string]string{
  8. `@`: "\x00", `£`: "\x01", `$`: "\x02",
  9. `¥`: "\x03", `è`: "\x04", `é`: "\x05",
  10. `ù`: "\x06", `ì`: "\x07", `ò`: "\x08",
  11. `Ç`: "\x09", `Ø`: "\x0B", `ø`: "\x0C",
  12. `Å`: "\x0E", `Δ`: "\x10", `_`: "\x11",
  13. `Φ`: "\x12", `Γ`: "\x13", `Λ`: "\x14",
  14. `Ω`: "\x15", `Π`: "\x16", `Ψ`: "\x17",
  15. `Σ`: "\x18", `Θ`: "\x19", `Ξ`: "\x1A",
  16. `Æ`: "\x1C", `æ`: "\x1D", `ß`: "\x1E",
  17. `É`: "\x1F", `Ä`: "\x5B", `Ö`: "\x5C",
  18. `Ñ`: "\x5D", `Ü`: "\x5E", `§`: "\x5F",
  19. `¿`: "\x60", `ä`: "\x7B", `ö`: "\x7C",
  20. `ñ`: "\x7D", `ü`: "\x7E", `à`: "\x7F",
  21. `^`: "\x1B\x14", `{`: "\x1B\x28",
  22. `}`: "\x1B\x29", `\`: "\x1B\x2F",
  23. `[`: "\x1B\x3C", `~`: "\x1B\x3D",
  24. `]`: "\x1B\x3E", `|`: "\x1B\x40",
  25. ``: "\x1B\x65",
  26. }
  27. var gsmUtf8Chars = map[string]string{
  28. "\x00": "\x40",
  29. "\x01": "\xC2\xA3",
  30. "\x02": "\x24",
  31. "\x03": "\xC2\xA5",
  32. "\x04": "\xC3\xA8",
  33. "\x05": "\xC3\xA9",
  34. "\x06": "\xC3\xB9",
  35. "\x07": "\xC3\xAC",
  36. "\x08": "\xC3\xB2",
  37. "\x09": "\xC3\x87",
  38. "\x0B": "\xC3\x98",
  39. "\x0C": "\xC3\xB8",
  40. "\x0E": "\xC3\xB8",
  41. "\x0F": "\xC3\xA5",
  42. "\x10": "\xCE\x94",
  43. "\x11": "\x5F",
  44. "\x12": "\xCE\xA6",
  45. "\x13": "\xCE\x93",
  46. "\x14": "\xCE\xA0",
  47. "\x15": "\xCE\xA9",
  48. "\x16": "\xCE\xA0",
  49. "\x17": "\xCE\xA8",
  50. "\x18": "\xCE\xA3",
  51. "\x19": "\xCE\x98",
  52. "\x1A": "\xCE\x9E",
  53. "\x1C": "\xC3\x86",
  54. "\x1D": "\xC3\xA6",
  55. "\x1E": "\xC3\x9F",
  56. "\x1F": "\xC3\x89",
  57. "\x20": "\x20",
  58. "\x24": "\xC2\xA4",
  59. "\x40": "\xC2\xA1",
  60. "\x5B": "\xC3\x84",
  61. "\x5C": "\xC3\x96",
  62. "\x5D": "\xC3\x91",
  63. "\x5E": "\xC3\x9C",
  64. "\x5F": "\xC2\xA7",
  65. "\x60": "\xC2\xBF",
  66. "\x7B": "\xC3\xA8",
  67. "\x7C": "\xC3\xB6",
  68. "\x7D": "\xC3\xB1",
  69. "\x7E": "\xC3\xBC",
  70. "\x7F": "\xC3\xA0",
  71. }
  72. func UTF8ToGsm0338(text string) string {
  73. var s string = text
  74. for k, v := range utf8GsmChars {
  75. s = strings.Replace(s, k, v, -1)
  76. }
  77. re := regexp.MustCompile("[\\x{0080}-\\x{10FFFF}]")
  78. s = re.ReplaceAllString(s, "?")
  79. return s
  80. }
  81. func GSM0338ToUTF8(text string) string {
  82. var s string = text
  83. for k, v := range gsmUtf8Chars {
  84. s = strings.Replace(s, k, v, -1)
  85. }
  86. return s
  87. }
  88. func main() {
  89. s := "Hello World"
  90. gsm := UTF8ToGsm0338(s)
  91. utf8 := GSM0338ToUTF8(gsm)
  92. fmt.Printf("word before: %s\nword after gsm: %s\nword after utf8: %s\n", s, gsm, utf8)
  93. }

huangapple
  • 本文由 发表于 2022年7月28日 17:28:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/73150438.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定