[]byte(string)有多贵?

huangapple go评论101阅读模式
英文:

How expensive is []byte(string)?

问题

让我们将string转换为[]byte

func toBytes(s string) []byte {
  return []byte(s) // 这里会发生什么?
}

这个转换操作的开销有多大?是否会进行复制?根据我在Go规范中看到的:字符串的行为类似于字节切片,但是是不可变的,这应该至少涉及复制,以确保后续的切片操作不会修改我们的字符串s。反向转换会发生什么?[]byte <-> string的转换是否涉及编码/解码,比如utf8 <-> runes?

英文:

Let's convert string to []byte:

func toBytes(s string) []byte {
  return []byte(s) // What happens here?
}

How expensive is this cast operation? Is copying performed? As far as I see in Go specification: Strings behave like slices of bytes but are immutable, this should involve at least copying to be sure subsequent slice operations will not modify our string s. What happens with reverse conversation? Does []byte &lt;-&gt; string conversation involve encoding/decoding, like utf8 <-> runes?

答案1

得分: 43

[]byte(s)不是一个强制转换,而是一个转换。有些转换与强制转换相同,比如uint(myIntvar),它只是在原地重新解释位。不幸的是,字符串到字节切片的转换不是这种情况。字节切片是可变的,而字符串(准确地说是字符串值)是不可变的。结果是需要进行必要的复制(内存分配+内容传输)来创建字符串的副本。所以在某些情况下,这可能是昂贵的。

编辑:不执行编码转换。字符串(源)字节被复制到切片(目标)字节中,就像它们一样。

英文:

The []byte(s) is not a cast but a conversion. Some conversions are the same as a cast, like uint(myIntvar), which just reinterprets the bits in place. Unfortunately that's not the case of string to byte slice conversion. Byte slices are mutable, strings (string values to be precise) are not. The outcome is a necessary copy (mem alloc + content transfer) of the string being made. So yes, it can be costly in some scenarios.

EDIT: No encoding transformation is performed. The string (source) bytes are copied to the slice (destination) bytes as they are.

答案2

得分: 13

转换会复制字节,但也会在堆上为[]byte分配空间。在将字符串重复转换为[]byte的情况下,通过重用[]byte并使用copy命令,可以节省内存管理时间。(参见http://golang.org/ref/spec#Appending_and_copying_slices以及关于使用字符串作为源的特殊情况。)

在转换和复制命令的两种情况下,复制本身是一个直接的字节复制,应该运行非常快。我希望编译器生成某种重复移动指令,CPU可以高效执行。

反向转换,将字节切片转换为字符串,肯定涉及在堆上分配字符串。这是由于不可变性属性所强制的。有时,您可以通过尽可能多地使用[]byte来优化,然后在最后创建一个字符串。bytes.Buffer类型通常很有用。

现在追逐红鲱鱼,编码和UTF-8不是问题。字符串和[]byte都可以保存任意数据。复制不会查看数据,只是复制它。在说字符串“意图”包含UTF-8或“鼓励”这样的话时,请选择用词谨慎。更准确的做法是简单地指出一些语言特性,例如for语句的range子句,“解释”字符串为UTF-8。只需了解哪些内容将字符串解释为UTF-8,哪些不会。在字符串中有非UTF-8并且需要按字节范围遍历它?没问题,只需不使用range子句。

s := "string"
for i := 0; i < len(s); i++ {
b := s[i]
// 使用b进行操作
}

这是Go的惯用写法。它不受鼓励,也不违反任何意图。它只是按字节逐个迭代字符串,有时这正是您想要做的。

英文:

The conversion copies the bytes, but it also allocates space for the []byte on the heap. In cases where you convert strings to []byte repeatedly, you might save memory management time by reusing the []byte and using the copy command. (See http://golang.org/ref/spec#Appending_and_copying_slices and the special case about using a string as the source.)

In both cases of the conversion and the copy command, the copy itself is a straight byte copy which should run very quickly. I would expect the compiler to generate some kind of repeat move instruction that the CPU executes efficiently.

The reverse conversion, making a string out of a byte slice, definitely involves allocating the string on the heap. The immutability property forces this. Sometimes you can optimize by doing as much work as possible with []byte and then creating a string at the end. The bytes.Buffer type is often useful.

Chasing the red herring now, encoding and UTF-8 are not issues. Strings and []byte can both hold arbitrary data. The copy does not look at the data, it just copies it. Choose words carefully when saying things like strings are intended to contain UTF-8 or that this is encouraged. It is more accurate to simply note that some language features, such as the range clause of a for statement, interpret strings as UTF-8. Just learn what interprets strings as UTF-8 and what doesn't. Have non-UTF-8 in a string and need to range over it byte-wise? No problem, just don't use the range clause.

s := &quot;string&quot;
for i := 0; i &lt; len(s); i++ {
    b := s[i]
    // work with b
}

This is idiomatic Go. It is not discouraged and it violates no intention. It simply iterates over the string byte-wise, which is sometimes just what you want to do.

答案3

得分: 1

作为对上面答案的补充,在最新的Go语言规范中,关于string和数值类型之间的类型转换有特殊规则,声明如下:

针对数值类型之间的(非常量)转换或与字符串类型之间的转换,有特定的规则适用。这些转换可能会改变x的表示并产生运行时开销。其他所有转换只改变类型而不改变x的表示。

英文:

In complement to the answers above, in the latest go language specification, the special rule of type conversion between string and numeric types is declared as follows:
> Specific rules apply to (non-constant) conversions between numeric types or to and from a string type. These conversions may change the representation of x and incur a run-time cost. All other conversions only change the type but not the representation of x.

huangapple
  • 本文由 发表于 2013年1月17日 14:53:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/14373634.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定