将字符串索引为字符

huangapple go评论90阅读模式
英文:

Indexing string as chars

问题

字符串的元素具有字节类型,并且可以使用常规的索引操作进行访问。

如何将字符串的元素作为字符获取?

“some”1 -> “o”

英文:

> The elements of strings have type byte and may be accessed using the
> usual indexing operations.

How can I get element of string as char ?

> "some"1 -> "o"

答案1

得分: 10

最简单的解决方案是将其转换为一个符文数组:

var runes = []rune("someString")

请注意,当您在字符串上进行迭代时,您不需要进行转换。请参考Effective Go中的示例:

for pos, char := range "日本語" {
    fmt.Printf("character %c starts at byte position %d\n", char, pos)
}

这将打印出:

character 日 starts at byte position 0
character 本 starts at byte position 3
character 語 starts at byte position 6
英文:

The simplest solution is to convert it to an array of runes :

var runes = []rune("someString")

Note that when you iterate on a string, you don't need the conversion. See this example from Effective Go :

for pos, char := range "日本語" {
    fmt.Printf("character %c starts at byte position %d\n", char, pos)
}

This prints

character 日 starts at byte position 0
character 本 starts at byte position 3
character 語 starts at byte position 6

答案2

得分: 4

Go字符串通常是UTF-8编码的,但不一定是。如果它们是Unicode字符串,那么术语“字符”相当复杂,并且没有通用/唯一的符文(码点)和Unicode字符的双射。

无论如何,可以很容易地在切片中使用码点(符文)并使用索引进行操作,使用以下转换:

package main

import "fmt"

func main() {
        utf8 := "Hello, 世界"
        runes := []rune(utf8)
        fmt.Printf("utf8:% 02x\nrunes: %#v\n", []byte(utf8), runes)
}

还可以在这里查看:http://play.golang.org/p/qWVSA-n93o

注意:通常通过索引访问Unicode“字符”是一个设计错误。大多数文本数据是按顺序处理的。

英文:

Go strings are usually, but not necessarily, UTF-8 encoded. In the case they are Unicode strings, the term "char[acter]" is pretty complex and there is no generall/unique bijection of runes (code points) and Unicode characters.

Anyway one can easily work with code points (runes) in a slice and use indexes into it using a conversion:

package main

import "fmt"

func main() {
        utf8 := "Hello, 世界"
        runes := []rune(utf8)
        fmt.Printf("utf8:% 02x\nrunes: %#v\n", []byte(utf8), runes)
}

Also here: http://play.golang.org/p/qWVSA-n93o

Note: Often the desire to access Unicode "characters" by index is a design mistake. Most of textual data is processed sequentially.

答案3

得分: 0

另一个选项是utf8string包:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("👁️👾👽👼")
   t := s.At(2)
   println(t == '👽')
}

https://pkg.go.dev/golang.org/x/exp/utf8string

英文:

Another option is the package utf8string:

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("🧡💛💚💙💜")
   t := s.At(2)
   println(t == '💚')
}

https://pkg.go.dev/golang.org/x/exp/utf8string

huangapple
  • 本文由 发表于 2012年10月29日 18:36:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/13119937.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定