使用for循环和range函数以外的方法访问字符串的随机rune元素。

huangapple go评论86阅读模式
英文:

Access random rune element of string without using for ... range

问题

最近我提出了这个问题,答案增加了我的理解,但并没有解决我实际遇到的问题。所以,我将尝试提出一个类似但不同的问题,如下所示。

假设我想访问字符串的随机rune元素。一种方法是:

func RuneElement(str string, idx int) rune {
  var ret rune
  for i, c := range str {
    if i == idx {
      return c
    }
  }
  return ret // 超出范围 -> 需要适当处理
}

如果我想多次调用这样的函数怎么办?我猜我正在寻找的是像str[i]这样的运算符/函数(返回一个byte),它返回第i个位置的rune元素。为什么可以使用for ... range访问该元素,但不能通过类似str.At(i)的函数访问呢?

英文:

I recently asked this question and the answers increased my understanding, but they didn't solve the actual problem I had. So, I will try to ask a similar but different question as follows.

Suppose that I want to access random rune element of a string. One way is:

func RuneElement(str string, idx int) rune {
  var ret rune
  for i, c := range str {
    if i == idx {
      return c
    }
  }
  return ret // out of range -> proper handling is needed
}

What if I want to call such a function a lot of times? I guess what I am looking for is like an operator/function like str[i] (which returns a byte) that return the rune element at i-th position. Why this element can be accessed using for ... range but not through a funtcion like str.At(i) for example?

答案1

得分: 4

在Go语言中,string类型存储的是文本的UTF-8编码字节序列。这是一个已经做出的设计决策,不会改变。

如果你想在任意索引位置高效地获取一个rune,你必须对字节进行解码,这是无法避免的(for ... range语句就是进行这种解码操作的)。没有什么“捷径”。所选择的表示方式并没有提供这个功能。

如果你需要频繁地进行这种操作,你应该改变输入的类型,不要使用string,而是使用[]rune,因为它是一个切片,可以高效地进行索引操作。在Go语言中,string并不等同于[]rune。在Go语言中,string实际上是一个只读的[]byte(UTF-8编码)。就是这样。

如果你不能改变输入的类型,你可以构建一个内部缓存,将string映射到它的[]rune

var cache = map[string][]rune{}

func RuneAt(s string, idx int) rune {
    rs := cache[s]
    if rs == nil {
        rs = []rune(s)
        cache[s] = []rune(s)
    }
    if idx >= len(rs) {
        return 0
    }
    return rs[idx]
}

这取决于具体情况是否值得这样做:如果RuneAt()函数被调用的string集合很小,这可能会大大提高性能。如果传入的字符串几乎都是唯一的,这将导致性能下降和大量的内存使用。此外,这个实现在并发使用时是不安全的。

英文:

string values in Go store the UTF-8 encoded byte sequence of the text. This is a design decision that has been made and it won't change.

If you want to efficiently get a rune from it at an arbitrary index, you have to decode the bytes, you can't do anything about that (the for ... range does this decoding). There is no "shortcut". The chosen representation just doesn't provide this out of the box.

If you have to do this frequently / many times, you should change your input and not use string but a []rune, as it's a slice and can be efficiently indexed. string in Go is not []rune. string in Go is effectively a read-only []byte (UTF-8). Period.

If you can't change the input type, you may build an internal cache mapped from string to its []rune:

var cache = map[string][]rune{}

func RuneAt(s string, idx int) rune {
	rs := cache
展开收缩
if rs == nil { rs = []rune(s) cache
展开收缩
= []rune(s) } if idx >= len(rs) { return 0 } return rs[idx] }

It depends on case whether this is worth it: if RuneAt() is called with a small set of strings, this may improve performance a lot. If the passed strings are more-or-less unique, this will result in worse performance and a lot of memory usage. Also this implementation is not safe for concurrent use.

huangapple
  • 本文由 发表于 2017年6月14日 00:42:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/44527223.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定