2015年3月20日 15:13:57go评论84阅读模式

英文:

How can I get the Unicode value of a character in go?

问题

我试图在Go中将字符串字符的Unicode值作为Int值获取。

我这样做：

value = strconv.Itoa(int(([]byte(char))[0]))

其中char包含一个只有一个字符的字符串。

这对许多情况都有效。但对于像ä、ö、ü、Ä、Ö、Ü这样的umlauts就不起作用。

例如，Ä的结果是65，与A相同。

我该怎么做？

补充说明：我有两个问题。第一个问题已经通过下面的任何答案解决了。第二个问题稍微棘手一些。我的输入不是Go规范化的UTF-8代码，例如umlauts由两个字符表示而不是一个字符。正如ANisus所说，解决方案可以在包golang.org/x/text/unicode/norm中找到。上面的代码现在变成了两行：

rune, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(char)))
value = strconv.Itoa(int(rune))

欢迎提供任何缩短代码的提示...

英文:

I try to get the unicode value of a string character in Go as an Int value.

I do this:

value = strconv.Itoa(int(([]byte(char))[0]))

where char contains a string with one character.

That works for many cases. It doesn't work for umlauts like ä, ö, ü, Ä, Ö, Ü.

E.g. Ä results in 65, which is the same as for A.

How can I do that?

Supplement: I had two problems. The first was solved with any of the answers below. The second was a bit more tricky. My input was not Go normalized UTF-8 code, e.g. umlauts were represented by two characters instead of one. As ANisus said the solution is found in the package golang.org/x/text/unicode/norm. The line above is now two lines:

rune, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(char)))
value = strconv.Itoa(int(rune))

Any hints to make this shorter welcome ...

答案1

得分: 11

字符串是UTF-8编码的，所以要解码字符串中的字符以获取rune（Unicode代码点），可以使用unicode/utf8包。

示例：

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	str := "AÅÄÖ"

	for len(str) > 0 {
		r, size := utf8.DecodeRuneInString(str)
		fmt.Printf("%d %v\n", r, size)

		str = str[size:]
	}
}

结果：

65 1
197 2
196 2
214 2

编辑：（为了澄清Michael的补充）

诸如Ä的字符可以使用不同的Unicode代码点创建：

预组合形式： Ä（U+00C4）
使用组合分音符： A（U+0041）+ ¨（U+0308）

为了获得预组合形式，可以使用规范化包golang.org/x/text/unicode/norm。NFC（规范分解，然后规范组合）形式将U+0041 + U+0308转换为U+00C4：

c := "\u0041\u0308"
r, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(c)))
fmt.Printf("%+q", r) // '\u00c4'

英文:

Strings are utf8 encoded, so to decode a character from a string to get the rune (unicode code point), you can use the unicode/utf8 package.

Example:

package main

import (
	&quot;fmt&quot;
	&quot;unicode/utf8&quot;
)

func main() {
	str := &quot;A&#197;&#196;&#214;&quot;

	for len(str) &gt; 0 {
		r, size := utf8.DecodeRuneInString(str)
		fmt.Printf(&quot;%d %v\n&quot;, r, size)

		str = str[size:]
	}
}

Result:
>65 1
>197 2
>196 2
>214 2

Edit: (To clarify Michael's supplement)

A character such as Ä may be created using different unicode code points:

Precomposed: Ä (U+00C4)
Using combining diaeresis: A (U+0041) + ¨ (U+0308)

In order to get the precomposed form, one can use the normalization package, golang.org/x/text/unicode/norm. The NFC (Canonical Decomposition,
followed by Canonical Composition) form will turn U+0041 + U+0308 into U+00C4:

c := &quot;\u0041\u0308&quot;
r, _ := utf8.DecodeRune(norm.NFC.Bytes([]byte(c)))
fmt.Printf(&quot;%+q&quot;, r) // &#39;\u00c4&#39;

答案2

得分: 8

在Go语言中，"character"类型是rune，它是int32的别名，也可以参考Rune literals。rune是一个整数值，用于标识Unicode码点。

在Go中，string以UTF-8编码的字节序列形式表示和存储文本。for循环的range形式用于迭代文本的rune：

s := "你好世界"
for _, r := range s {
    fmt.Printf("%c - %d\n", r, r)
}

输出结果：

你 - 20320
好 - 22909
世 - 19990
界 - 30028

你可以在Go Playground上尝试运行。

如果你想了解更多关于这个主题的内容，可以阅读这篇博文：

Go中的字符串、字节、符文和字符

英文:

The "character" type in Go is the rune which is an alias for int32, see also Rune literals. A rune is an integer value identifying a Unicode code point.

In Go strings are represented and stored as the UTF-8 encoded byte sequence of the text. The range form of the for loop iterates over the runes of the text:

s := &quot;&#228;&#246;&#252;&#196;&#214;&#220;世界&quot;
for _, r := range s {
	fmt.Printf(&quot;%c - %d\n&quot;, r, r)
}

Output:

&#228; - 228
&#246; - 246
&#252; - 252
&#196; - 196
&#214; - 214
&#220; - 220
世 - 19990
界 - 30028

Try it on the Go Playground.

Read this blog article if you want to know more about the topic:

Strings, bytes, runes and characters in Go

答案3

得分: 6

你可以使用unicode/utf8包

rune,_:=utf8.DecodeRuneInString("Ä")
fmt.Println(rune)

英文:

you can use the unicode/utf8 package

rune,_:=utf8.DecodeRuneInString(&quot;&#196;&quot;)
fmt.Println(rune)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I get the Unicode value of a character in go?

问题

答案1

答案2

答案3

为什么我在我的Go HTML模板输出中看到ZgotmplZ？

Go web服务器无法正确处理/delete/模式。

Go语言是否依赖于C运行时？

在Go语言中绘制基准测试结果。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论