使用不安全的方式将 []byte 转换为 string 在 Go 中可能会导致哪些后果?

huangapple go评论79阅读模式
英文:

What are the possible consequences of using unsafe conversion from []byte to string in go?

问题

[]byte转换为string的首选方法是这样的:

var b []byte
// 填充 b
s := string(b)

在这段代码中,字节切片被复制,这在性能重要的情况下可能会成为问题。

当性能至关重要时,可以考虑执行不安全的转换:

var b []byte
// 填充 b
s := *(*string)(unsafe.Pointer(&b))

我的问题是:使用不安全的转换时可能会出现什么问题?我知道string应该是不可变的,如果我们改变了bs也会改变。但是,那又怎样?这是唯一可能发生的问题吗?

英文:

The preferred way of converting []byte to string is this:

var b []byte
// fill b
s := string(b)

In this code byte slice is copied, which can be a problem in situations where performance is important.

When performance is critical, one can consider performing the unsafe conversion:

var b []byte
// fill b
s :=  *(*string)(unsafe.Pointer(&b))

My question is: what can go wrong when using the unsafe conversion? I known that string should be immutable and if we change b, s will also be changed. And still: so what? Is it all bad that can happen?

答案1

得分: 5

修改语言规范保证为不可变的内容是背叛的行为。

由于规范保证了string是不可变的,编译器可以生成基于此进行缓存和其他优化的代码。你不能以任何正常的方式改变string的值,如果你采用不正当的方式(比如使用unsafe包),你将失去规范提供的所有保证,并且如果继续使用修改后的string,可能会遇到“错误”和随机的意外情况。

例如,如果你将string用作映射中的键,并在将其放入映射后更改string,你可能无法使用原始值或修改后的值在映射中找到关联的值(这取决于具体实现)。

为了证明这一点,看看下面的示例:

m := map[string]int{}
b := []byte("hi")
s := *(*string)(unsafe.Pointer(&b))
m[s] = 999

fmt.Println("Before:", m)

b[0] = 'b'
fmt.Println("After:", m)

fmt.Println("But it's there:", m[s], m["bi"])

for i := 0; i < 1000; i++ {
    m[strconv.Itoa(i)] = i
}
fmt.Println("Now it's GONE:", m[s], m["bi"])
for k, v := range m {
    if k == "bi" {
        fmt.Println("But still there, just in a different bucket:", k, v)
    }
}

输出结果(在Go Playground上尝试):

Before: map[hi:999]
After: map[bi:<nil>]
But it's there: 999 999
Now it's GONE: 0 0
But still there, just in a different bucket: bi 999

起初,我们只看到一些奇怪的结果:简单的Println()无法找到其值。它找到了某个东西(找到了键),但值显示为nil,这甚至不是值类型int的有效值(int的零值是0)。

如果我们将映射扩大到很大(添加1000个元素),映射的内部数据结构将被重构。在此之后,我们甚至无法通过使用适当的键显式请求来找到我们的值。它仍然存在于映射中,因为我们遍历所有键值对时可以找到它,但由于哈希码随着string的值的更改而改变,很可能在不同的存储桶中搜索它(或者应该在的存储桶)。

还要注意,使用unsafe包的代码可能按预期工作,但相同的代码在将来(或旧版本)的Go中可能会完全不同(这意味着可能会出错),因为“导入unsafe的包可能是不可移植的,并且不受Go 1兼容性指南的保护”。

此外,由于修改后的string可能以不同的方式使用,你可能会遇到意外错误。有人可能只复制字符串头部,有人可能复制其内容。看看下面的示例:

b := []byte{'h', 'i'}
s := *(*string)(unsafe.Pointer(&b))

s2 := s                 // 复制字符串头部
s3 := string([]byte(s)) // 新的字符串头部,但内容相同
fmt.Println(s, s2, s3)
b[0] = 'b'

fmt.Println(s == s2)
fmt.Println(s == s3)

我们使用s创建了两个新的局部变量s2s3s2通过复制s的字符串头部进行初始化,s3通过使用新的string值(新的字符串头部)但内容相同进行初始化。现在,如果你修改原始的s,你期望在正确的程序中将新的字符串与原始字符串进行比较,无论结果是true还是false(基于值是否被缓存,但应该是相同的)。

但输出结果是(在Go Playground上尝试):

hi hi hi
true
false
英文:

Modifying something that the language spec guarantees to be immutable is an act of treason.

Since the spec guarantees that strings are immutable, compilers are allowed to generate code that caches their values and does other optimization based on this. You can't change values of strings in any normal way, and if you resort to dirty ways (like package unsafe) to still do it, you lose all the guarantees provided by the spec, and by continuing to use the modified strings, you may bump into "bugs" and unexpected things randomly.

For example if you use a string as a key in a map and you change the string after you put it into the map, you might not be able to find the associated value in the map using either the original or the modified value of the string (this is implementation dependent).

To demonstrate this, see this example:

m := map[string]int{}
b := []byte(&quot;hi&quot;)
s := *(*string)(unsafe.Pointer(&amp;b))
m
展开收缩
= 999 fmt.Println(&quot;Before:&quot;, m) b[0] = &#39;b&#39; fmt.Println(&quot;After:&quot;, m) fmt.Println(&quot;But it&#39;s there:&quot;, m
展开收缩
, m[&quot;bi&quot;]) for i := 0; i &lt; 1000; i++ { m[strconv.Itoa(i)] = i } fmt.Println(&quot;Now it&#39;s GONE:&quot;, m
展开收缩
, m[&quot;bi&quot;]) for k, v := range m { if k == &quot;bi&quot; { fmt.Println(&quot;But still there, just in a different bucket: &quot;, k, v) } }

Output (try it on the Go Playground):

Before: map[hi:999]
After: map[bi:&lt;nil&gt;]
But it&#39;s there: 999 999
Now it&#39;s GONE: 0 0
But still there, just in a different bucket:  bi 999

At first, we just see some weird result: simple Println() is not able to find its value. It sees something (key is found), but value is displayed as nil which is not even a valid value for the value type int (zero value for int is 0).

If we grow the map to be big (we add 1000 elements), internal data structure of the map gets restructured. After this, we're not even able to find our value by explicitly asking for it with the appropriate key. It is still in the map as iterating over all its key-value pairs we find it, but since hash code changes as the value of the string changes, most likely it is searched for in a different bucket than where it is (or where it should be).

Also note that code using package unsafe may work as you expect it now, but the same code might work completely differently (meaning it may break) with a future (or old) version of Go as "packages that import unsafe may be non-portable and are not protected by the Go 1 compatibility guidelines".

Also you may run into unexpected errors as the modified string might be used in different ways. Someone might just copy the string header, someone may copy its content. See this example:

b := []byte{&#39;h&#39;, &#39;i&#39;}
s := *(*string)(unsafe.Pointer(&amp;b))

s2 := s                 // Copy string header
s3 := string([]byte(s)) // New string header but same content
fmt.Println(s, s2, s3)
b[0] = &#39;b&#39;

fmt.Println(s == s2)
fmt.Println(s == s3)

We created 2 new local variables s2 and s3 using s, s2 initialized by copying the string header of s, and s3 is initialized with a new string value (new string header) but with the same content. Now if you modify the original s, you would expect in a correct program that comparing the new strings to the original you would get the same result be it either true or false (based on if values were cached, but should be the same).

But the output is (try it on the Go Playground):

hi hi hi
true
false

huangapple
  • 本文由 发表于 2015年11月27日 15:35:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/33952378.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定