How can I get the Unicode value of a character in go?

huangapple go评论78阅读模式
英文:

How can I get the Unicode value of a character in go?

问题

我试图在Go中将字符串字符的Unicode值作为Int值获取。

我这样做:

value = strconv.Itoa(int(([]byte(char))[0]))

其中char包含一个只有一个字符的字符串。

这对许多情况都有效。但对于像ä、ö、ü、Ä、Ö、Ü这样的umlauts就不起作用。

例如,Ä的结果是65,与A相同。

我该怎么做?

补充说明:我有两个问题。第一个问题已经通过下面的任何答案解决了。第二个问题稍微棘手一些。我的输入不是Go规范化的UTF-8代码,例如umlauts由两个字符表示而不是一个字符。正如ANisus所说,解决方案可以在包golang.org/x/text/unicode/norm中找到。上面的代码现在变成了两行:

rune, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(char)))
value = strconv.Itoa(int(rune))

欢迎提供任何缩短代码的提示...

英文:

I try to get the unicode value of a string character in Go as an Int value.

I do this:

value = strconv.Itoa(int(([]byte(char))[0]))

where char contains a string with one character.

That works for many cases. It doesn't work for umlauts like ä, ö, ü, Ä, Ö, Ü.

E.g. Ä results in 65, which is the same as for A.

How can I do that?

Supplement: I had two problems. The first was solved with any of the answers below. The second was a bit more tricky. My input was not Go normalized UTF-8 code, e.g. umlauts were represented by two characters instead of one. As ANisus said the solution is found in the package golang.org/x/text/unicode/norm. The line above is now two lines:

rune, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(char)))
value = strconv.Itoa(int(rune)) 

Any hints to make this shorter welcome ...

答案1

得分: 11

字符串是UTF-8编码的,所以要解码字符串中的字符以获取rune(Unicode代码点),可以使用unicode/utf8包。

示例:

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	str := "AÅÄÖ"

	for len(str) > 0 {
		r, size := utf8.DecodeRuneInString(str)
		fmt.Printf("%d %v\n", r, size)

		str = str[size:]
	}
}

结果:

65 1
197 2
196 2
214 2

编辑:(为了澄清Michael的补充)

诸如Ä的字符可以使用不同的Unicode代码点创建:

预组合形式: Ä(U+00C4)
使用组合分音符: A(U+0041)+ ¨(U+0308)

为了获得预组合形式,可以使用规范化包golang.org/x/text/unicode/norm。NFC(规范分解,然后规范组合)形式将U+0041 + U+0308转换为U+00C4:

c := "\u0041\u0308"
r, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(c)))
fmt.Printf("%+q", r) // '\u00c4'
英文:

Strings are utf8 encoded, so to decode a character from a string to get the rune (unicode code point), you can use the unicode/utf8 package.

Example:

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	str := "AÅÄÖ"

	for len(str) > 0 {
		r, size := utf8.DecodeRuneInString(str)
		fmt.Printf("%d %v\n", r, size)

		str = str[size:]
	}
}

Result:
>65 1
>197 2
>196 2
>214 2

Edit: (To clarify Michael's supplement)

A character such as Ä may be created using different unicode code points:

Precomposed: Ä (U+00C4)
Using combining diaeresis: A (U+0041) + ¨ (U+0308)

In order to get the precomposed form, one can use the normalization package, golang.org/x/text/unicode/norm. The NFC (Canonical Decomposition,
followed by Canonical Composition) form will turn U+0041 + U+0308 into U+00C4:

c := "\u0041\u0308"
r, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(c)))
fmt.Printf("%+q", r) // '\u00c4'

答案2

得分: 8

在Go语言中,"character"类型是rune,它是int32的别名,也可以参考Rune literalsrune是一个整数值,用于标识Unicode码点。

在Go中,string以UTF-8编码的字节序列形式表示和存储文本。for循环的range形式用于迭代文本的rune

s := "你好世界"
for _, r := range s {
    fmt.Printf("%c - %d\n", r, r)
}

输出结果:

你 - 20320
好 - 22909
世 - 19990
界 - 30028

你可以在Go Playground上尝试运行。

如果你想了解更多关于这个主题的内容,可以阅读这篇博文:

Go中的字符串、字节、符文和字符

英文:

The "character" type in Go is the rune which is an alias for int32, see also Rune literals. A rune is an integer value identifying a Unicode code point.

In Go strings are represented and stored as the UTF-8 encoded byte sequence of the text. The range form of the for loop iterates over the runes of the text:

s := "äöüÄÖÜ世界"
for _, r := range s {
	fmt.Printf("%c - %d\n", r, r)
}

Output:

ä - 228
ö - 246
ü - 252
Ä - 196
Ö - 214
Ü - 220
世 - 19990
界 - 30028

Try it on the Go Playground.

Read this blog article if you want to know more about the topic:

Strings, bytes, runes and characters in Go

答案3

得分: 6

你可以使用unicode/utf8

rune,_:=utf8.DecodeRuneInString("Ä")
fmt.Println(rune)
英文:

you can use the unicode/utf8 package

rune,_:=utf8.DecodeRuneInString("Ä")
fmt.Println(rune)

huangapple
  • 本文由 发表于 2015年3月20日 15:13:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/29161300.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定