如何从字符串中获取单个 Unicode 字符

huangapple go评论69阅读模式
英文:

How to get a single Unicode character from string

问题

我想知道如何从字符串中获取一个 Unicode 字符。例如,如果字符串是"你好",我如何获取第一个字符"你"?

我从另一个地方找到了一种方法:

var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))

这个方法是有效的。但我还有一些问题:

  1. 是否有其他方法可以实现这个目的?

  2. 为什么在 Go 语言中,str[0] 不能从字符串中获取一个 Unicode 字符,而是获取字节数据?

英文:

I wonder how I can I get a Unicode character from a string. For example, if the string is "你好", how can I get the first character "你"?

From another place I get one way:

var str = "你好"
runes := []rune(str)
fmt.Println(string(runes[0]))

It does work.
But I still have some questions:

  1. Is there another way to do it?

  2. Why in Go does str[0] not get a Unicode character from a string, but it gets byte data?

答案1

得分: 44

首先,你可能想阅读https://blog.golang.org/strings,它会回答你部分问题。

在Go中,字符串可以包含任意字节。当你写str[i]时,结果是一个字节,索引始终是字节数。

大多数情况下,字符串是以UTF-8编码的。你有多种方法来处理字符串中的UTF-8编码。

例如,你可以使用for...range语句逐个字符地迭代字符串。

var first rune
for _, c := range str {
    first = c
    break
}
// first现在包含字符串的第一个字符

你还可以利用unicode/utf8包。例如:

r, size := utf8.DecodeRuneInString(str)
// r包含字符串的第一个字符
// size是字符的字节数

如果字符串以UTF-8编码,没有直接的方法来访问字符串的第n个字符,因为字符的大小(以字节为单位)是不固定的。如果你需要这个功能,你可以很容易地编写自己的辅助函数来实现它(可以使用for...range或unicode/utf8包)。

英文:

First, you may want to read https://blog.golang.org/strings
It will answer part of your questions.

A string in Go can contains arbitrary bytes. When you write str[i], the result is a byte, and the index is always a number of bytes.

Most of the time, strings are encoded in UTF-8 though. You have multiple ways to deal with UTF-8 encoding in a string.

For instance, you can use the for...range statement to iterate on a string rune by rune.

var first rune
for _,c := range str {
    first = c
    break
}
// first now contains the first rune of the string

You can also leverage the unicode/utf8 package. For instance:

r, size := utf8.DecodeRuneInString(str)
// r contains the first rune of the string
// size is the size of the rune in bytes

If the string is encoded in UTF-8, there is no direct way to access the nth rune of the string, because the size of the runes (in bytes) is not constant. If you need this feature, you can easily write your own helper function to do it (with for...range, or with the unicode/utf8 package).

答案2

得分: 2

你可以使用utf8string包:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
   // 示例 1
   r := s.At(1)
   println(r == 'Å')
   // 示例 2
   t := s.Slice(1, 3)
   println(t == "Åà")
}

https://pkg.go.dev/golang.org/x/exp/utf8string

英文:

You can use the utf8string package:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("ÄÅàâäåçèéêëìîïü")
   // example 1
   r := s.At(1)
   println(r == 'Å')
   // example 2
   t := s.Slice(1, 3)
   println(t == "Åà")
}

https://pkg.go.dev/golang.org/x/exp/utf8string

答案3

得分: -2

你可以这样做:

func main() {
  str := "cat"
  var s rune
  for i, c := range str {
    if i == 2 {
      s = c
    }
  }
}

现在,s的值等于'a'。

英文:

you can do this:

func main() {
  str := "cat"
  var s rune
  for i, c := range str {
    if i == 2 {
      s = c
    }
  }
}

s is now equal to a

huangapple
  • 本文由 发表于 2015年5月15日 23:44:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/30263607.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定