Golang:如何正确解析来自C的UTF-8字符串

huangapple go评论83阅读模式
英文:

Golang: How to correctly parse UTF-8 string from C

问题

我是一个对Go语言不太熟悉的新手,所以这可能很明显。

我有一个Go函数,我正在使用go build -buildmode=c-shared和相应的//export funcName注释将其暴露给C语言。
(你可以在这里看到:https://github.com/udl/bmatch/blob/master/ext/levenshtein.go#L42)

我的转换目前是这样的:

func distance(s1in, s2in *C.char) int {
    s1 := C.GoString(s1in)
    s2 := C.GoString(s2in)

在这里,我该如何处理UTF-8输入?
我看到有一个UTF-8包,但我不太明白它是如何工作的。https://golang.org/pkg/unicode/utf8/

谢谢!

英文:

I'm a newbie to the go world, so maybe this is obvious.

I have a Go function which I'm exposing to C with the go build -buildmode=c-shared and corresponding //export funcName comment.
(You can see it here: https://github.com/udl/bmatch/blob/master/ext/levenshtein.go#L42)

My conversion currently works like this:

func distance(s1in, s2in *C.char) int {
	s1 := C.GoString(s1in)
	s2 := C.GoString(s2in)

How would I handle UTF-8 input here?
I've seen there is a UTF-8 package but I don't quite get how it works. https://golang.org/pkg/unicode/utf8/

Thank you!

答案1

得分: 6

你不需要做任何特殊处理。UTF-8是Go的“本地”字符编码,所以你可以使用你提到的utf8包中的函数,比如utf8.RuneCountInString来获取字符串中Unicode字符的数量。请记住,len(s)仍然会返回字符串中的字节数。

有关详细信息,请参阅官方博客中的这篇文章这篇文章

英文:

You don't need to do anything special. UTF-8 is Go's "native" character encoding, so you can use the functions from the utf8 package you mentioned, e.g. utf8.RuneCountInString to get the number of Unicode runes in a string. Keep in mind that len(s) will still return the number of bytes in the string.

See this post in the official blog or this article for some details.

huangapple
  • 本文由 发表于 2015年10月1日 00:02:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/32870736.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定