英文:
Access random rune element of string without using for ... range
问题
最近我提出了这个问题,答案增加了我的理解,但并没有解决我实际遇到的问题。所以,我将尝试提出一个类似但不同的问题,如下所示。
假设我想访问字符串的随机rune
元素。一种方法是:
func RuneElement(str string, idx int) rune {
var ret rune
for i, c := range str {
if i == idx {
return c
}
}
return ret // 超出范围 -> 需要适当处理
}
如果我想多次调用这样的函数怎么办?我猜我正在寻找的是像str[i]
这样的运算符/函数(返回一个byte
),它返回第i
个位置的rune
元素。为什么可以使用for ... range
访问该元素,但不能通过类似str.At(i)
的函数访问呢?
英文:
I recently asked this question and the answers increased my understanding, but they didn't solve the actual problem I had. So, I will try to ask a similar but different question as follows.
Suppose that I want to access random rune
element of a string
. One way is:
func RuneElement(str string, idx int) rune {
var ret rune
for i, c := range str {
if i == idx {
return c
}
}
return ret // out of range -> proper handling is needed
}
What if I want to call such a function a lot of times? I guess what I am looking for is like an operator/function like str[i]
(which returns a byte
) that return the rune
element at i
-th position. Why this element can be accessed using for ... range
but not through a funtcion like str.At(i)
for example?
答案1
得分: 4
在Go语言中,string
类型存储的是文本的UTF-8编码字节序列。这是一个已经做出的设计决策,不会改变。
如果你想在任意索引位置高效地获取一个rune
,你必须对字节进行解码,这是无法避免的(for ... range
语句就是进行这种解码操作的)。没有什么“捷径”。所选择的表示方式并没有提供这个功能。
如果你需要频繁地进行这种操作,你应该改变输入的类型,不要使用string
,而是使用[]rune
,因为它是一个切片,可以高效地进行索引操作。在Go语言中,string
并不等同于[]rune
。在Go语言中,string
实际上是一个只读的[]byte
(UTF-8编码)。就是这样。
如果你不能改变输入的类型,你可以构建一个内部缓存,将string
映射到它的[]rune
:
var cache = map[string][]rune{}
func RuneAt(s string, idx int) rune {
rs := cache[s]
if rs == nil {
rs = []rune(s)
cache[s] = []rune(s)
}
if idx >= len(rs) {
return 0
}
return rs[idx]
}
这取决于具体情况是否值得这样做:如果RuneAt()
函数被调用的string
集合很小,这可能会大大提高性能。如果传入的字符串几乎都是唯一的,这将导致性能下降和大量的内存使用。此外,这个实现在并发使用时是不安全的。
英文:
string
values in Go store the UTF-8 encoded byte sequence of the text. This is a design decision that has been made and it won't change.
If you want to efficiently get a rune
from it at an arbitrary index, you have to decode the bytes, you can't do anything about that (the for ... range
does this decoding). There is no "shortcut". The chosen representation just doesn't provide this out of the box.
If you have to do this frequently / many times, you should change your input and not use string
but a []rune
, as it's a slice and can be efficiently indexed. string
in Go is not []rune
. string
in Go is effectively a read-only []byte
(UTF-8). Period.
If you can't change the input type, you may build an internal cache mapped from string
to its []rune
:
var cache = map[string][]rune{}
func RuneAt(s string, idx int) rune {
rs := cache展开收缩
if rs == nil {
rs = []rune(s)
cache展开收缩 = []rune(s)
}
if idx >= len(rs) {
return 0
}
return rs[idx]
}
It depends on case whether this is worth it: if RuneAt()
is called with a small set of string
s, this may improve performance a lot. If the passed strings are more-or-less unique, this will result in worse performance and a lot of memory usage. Also this implementation is not safe for concurrent use.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论