通过不安全的方式将字符串转换为字节切片会改变其地址。

huangapple go评论89阅读模式
英文:

Byte slice converted with unsafe from string changes its address

问题

我有这个函数可以将字符串转换为字节切片而不进行复制:

func StringToByteUnsafe(s string) []byte {
    strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
    var sh reflect.SliceHeader
    sh.Data = strh.Data
    sh.Len = strh.Len
    sh.Cap = strh.Len
    return *(*[]byte)(unsafe.Pointer(&sh))
}

这个函数运行良好,但在特定的设置下会出现非常奇怪的行为:

设置在这里:https://github.com/leviska/go-unsafe-gc/blob/main/pkg/pkg_test.go

发生了什么:

  1. 创建一个字节切片
  2. 将其转换为临时(rvalue)字符串,并使用不安全的方式将其再次转换为字节切片
  3. 然后,通过引用复制此切片
  4. 然后,在goroutine内部对第二个切片进行一些操作
  5. 打印指针的变化前后情况

我在我的Linux Mint笔记本电脑上使用go 1.16运行时得到了以下输出:

go test ./pkg -v -count=1
=== RUN   TestSomething
0xc000046720 123 0xc000046720 123
0xc000076f20 123 0xc000046721 z
--- PASS: TestSomething (0.84s)
PASS
ok      github.com/leviska/go-unsafe-gc/pkg     0.847s

所以,第一个切片的地址神奇地改变了,而第二个切片没有改变。

如果我们删除带有runtime.GC()的goroutine(并可能稍微修改代码),我们可以使两个指针的值都改变(变为相同的值)。

如果我们将不安全的转换改为[]byte(),则所有地址都不会改变。此外,如果我们将其更改为来自此处的不安全转换https://stackoverflow.com/a/66218124/5516391,一切都能正常工作。

func StringToByteUnsafe(str string) []byte { // this works fine
    var buf = *(*[]byte)(unsafe.Pointer(&str))
    (*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
    return buf
}

我使用GOGC=off运行它并得到了相同的结果。我使用-race运行它并没有出现错误。

如果您将其作为主包运行,并且具有主函数,它似乎可以正常工作。如果删除Convert函数也是如此。我猜测编译器在这些情况下对代码进行了优化。

所以,我有几个关于此事的问题:

  1. 到底发生了什么?看起来像是奇怪的未定义行为。
  2. 为什么以及如何Go运行时会神奇地更改变量的地址?
  3. 为什么在无并发的情况下可以更改两个地址,而在并发的情况下不能?
  4. 这个不安全的转换与stackoverflow答案中的转换有什么区别?为什么它能正常工作?

或者这只是编译器的一个错误?

这是来自github的完整代码副本,您需要将其放入某个包中并作为测试运行:

package pkg_test

import (
    "fmt"
    "reflect"
    "sync"
    "testing"
    "unsafe"
)

func StringToByteUnsafe(s string) []byte {
    strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
    var sh reflect.SliceHeader
    sh.Data = strh.Data
    sh.Len = strh.Len
    sh.Cap = strh.Len
    return *(*[]byte)(unsafe.Pointer(&sh))
}

func Convert(s []byte) []byte {
    return StringToByteUnsafe(string(s))
}

type T struct {
    S []byte
}

func Copy(s []byte) T {
    return T{S: s}
}

func Mid(a []byte, b []byte) []byte {
    fmt.Printf("%p %s %p %s\n", a, a, b, b)
    wg := sync.WaitGroup{}
    wg.Add(1)
    go func() {
        b = b[1:2]
        wg.Done()
    }()
    wg.Wait()
    fmt.Printf("%p %s %p %s\n", a, a, b, b)
    return b
}

func TestSomething(t *testing.T) {
    str := "123"
    a := Convert([]byte(str))
    b := Copy(a)
    Mid(a, b.S)
}
英文:

I have this function to convert string to slice of bytes without copying

func StringToByteUnsafe(s string) []byte {
	strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
	var sh reflect.SliceHeader
	sh.Data = strh.Data
	sh.Len = strh.Len
	sh.Cap = strh.Len
	return *(*[]byte)(unsafe.Pointer(&sh))
}

That works fine, but with very specific setup gives very strange behavior:

The setup is here: https://github.com/leviska/go-unsafe-gc/blob/main/pkg/pkg_test.go

What happens:

  1. Create a byte slice
  2. Convert it into temporary (rvalue) string and with unsafe convert it into byte slice again
  3. Then, copy this slice (by reference)
  4. Then, do something with the second slice inside goroutine
  5. Print the pointers before and after

And I have this output on my linux mint laptop with go 1.16:

go test ./pkg -v -count=1
=== RUN   TestSomething
0xc000046720 123 0xc000046720 123
0xc000076f20 123 0xc000046721 z
--- PASS: TestSomething (0.84s)
PASS
ok      github.com/leviska/go-unsafe-gc/pkg     0.847s

So, the first slice magically changes its address, while the second isn't

If we remove the goroutine with runtime.GC() (and may be play with the code a little bit), we can get the both pointers to change the value (to the same one).

If we change the unsafe cast to just []byte() everything works without changing the addresses. Also, if we change it to the unsafe cast from here https://stackoverflow.com/a/66218124/5516391 everything works the same.

func StringToByteUnsafe(str string) []byte { // this works fine
	var buf = *(*[]byte)(unsafe.Pointer(&str))
	(*reflect.SliceHeader)(unsafe.Pointer(&buf)).Cap = len(str)
	return buf
}

I run it with GOGC=off and got the same result. I run it with -race and got no errors.

If you run this as main package with main function, it seems to work correctly. Also if you remove the Convert function. My guess is that compiler optimizes stuff in this cases.

So, I have several questions about this:

  1. What the hell is happening? Looks like a weird UB
  2. Why and how go runtime magically changes the address of the variable?
  3. Why in concurentless case it can change both addresses, while in concurrent can't?
  4. What's the difference between this unsafe cast and the cast from stackoverflow answer? Why it does work?

Or is this just a compiler bug?

A copy of the full code from github, you need to put it in some package and run as test:


import (
	"fmt"
	"reflect"
	"sync"
	"testing"
	"unsafe"
)

func StringToByteUnsafe(s string) []byte {
	strh := (*reflect.StringHeader)(unsafe.Pointer(&s))
	var sh reflect.SliceHeader
	sh.Data = strh.Data
	sh.Len = strh.Len
	sh.Cap = strh.Len
	return *(*[]byte)(unsafe.Pointer(&sh))
}

func Convert(s []byte) []byte {
	return StringToByteUnsafe(string(s))
}

type T struct {
	S []byte
}

func Copy(s []byte) T {
	return T{S: s}
}

func Mid(a []byte, b []byte) []byte {
	fmt.Printf("%p %s %p %s\n", a, a, b, b)
	wg := sync.WaitGroup{}
	wg.Add(1)
	go func() {
		b = b[1:2]
		wg.Done()
	}()
	wg.Wait()
	fmt.Printf("%p %s %p %s\n", a, a, b, b)
	return b
}

func TestSomething(t *testing.T) {
	str := "123"
	a := Convert([]byte(str))
	b := Copy(a)
	Mid(a, b.S)
}

答案1

得分: 0

来自 GitHub 问题 https://github.com/golang/go/issues/47247 的回答:

> a 的后备存储在堆栈上分配,因为它没有逃逸。而 goroutine 的堆栈可以动态移动。另一方面,b 逃逸到堆上,因为它被传递给另一个 goroutine。一般来说,我们不假设对象的地址不会改变。
>
> 这个工作正常。

而我的版本是错误的,因为

> 它将 reflect.SliceHeader 用作普通结构体。你可以在其上运行 go vet,go vet 会警告你。

英文:

Answer from the github issue https://github.com/golang/go/issues/47247

> The backing store of a is allocated on stack, because it does not
> escape. And goroutine stacks can move dynamically. b, on the other
> hand, escapes to heap, because it is passed to another goroutine. In
> general, we don't assume the address of an object don't change.
>
> This works as intended.

And my version is incorrect because

> it uses reflect.SliceHeader as plain struct. You can run go vet on it,
> and go vet will warn you.`

huangapple
  • 本文由 发表于 2021年7月16日 06:19:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/68401381.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定