Go在对字符串进行类型断言时会进行堆分配。

huangapple go评论166阅读模式
英文:

Go makes heap allocation doing type assertion on string

问题

func Benchmark_maybeString(b *testing.B) {
	str := "foobar"

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		_ = maybeString(str)
	}
}

func maybeString(val any) string {
	switch val.(type) {
	case string:
		return val.(string)
	default:
		return ""
	}
}
Benchmark_maybeString        0.3118 ns/op	       0 B/op	       0 allocs/op

然而,将不会在基准测试中运行的default情况更改为更复杂的内容会导致函数分配内存:

func maybeString(val any) string {
	switch val.(type) {
	case string:
		return val.(string)
	default:
		return fmt.Sprintf("not a string: %T", val)
	}
}
Benchmark_maybeString	        18.65 ns/op	      16 B/op	       1 allocs/op

为什么会这样?如果是因为字符串逃逸到堆上,那是否意味着Go可以在栈上分配字符串?那么下面的版本呢?我假设它应该从一开始就在堆上,但仍然会分配内存:

data := map[string]string{
	"foobar": time.Now().Format(time.RFC3339),
}
maybeString(data["foobar"])

到目前为止的发现

  1. str := "foobar"更改为const str = "foobar"var str any = "foobar"在两个版本中都不会分配内存。为什么?是因为Go中的字符串并不总是被等量地分配吗?

  2. 分配内存的版本在执行类型断言CMPL之前调用了runtime.convTstring()。但是go:string."foobar"(SB)语法是什么意思?Go在对字符串进行类型断言时会进行堆分配。

英文:
func Benchmark_maybeString(b *testing.B) {
	str := &quot;foobar&quot;

	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i &lt; b.N; i++ {
		_ = maybeString(str)
	}
}

func maybeString(val any) string {
	switch val.(type) {
	case string:
		return val.(string)
	default:
		return &quot;&quot;
	}
}
Benchmark_maybeString        0.3118 ns/op	       0 B/op	       0 allocs/op

However, changing the default case, which doesn't even run in the benchmark, to something more complicated suddenly makes the function allocate:

func maybeString(val any) string {
	switch val.(type) {
	case string:
		return val.(string)
	default:
		return fmt.Sprintf(&quot;not a string: %T&quot;, val)
	}
}
Benchmark_maybeString	        18.65 ns/op	      16 B/op	       1 allocs/op

Why? If it's because the string is escaping to heap, does it mean Go can allocate strings on the stack? How about the following version, which i assume should be in the heap from the very beginning but still allocates:

data := map[string]string{
	&quot;foobar&quot;: time.Now().Format(time.RFC3339),
}
maybeString(data[&quot;foobar&quot;])

Figured so far

  1. Changing str := &quot;foobar&quot; to const str = &quot;foobar&quot; or var str any = &quot;foobar&quot; makes 0 allocations in both version. Why? Are somehow strings in Go not always allocated equally?

  2. The allocating version calls runtime.convTstring() before performing the type assertion CMPL. But what does the go:string.&quot;foobar&quot;(SB) syntax mean? Go在对字符串进行类型断言时会进行堆分配。

答案1

得分: 2

分配是由于fmt.Sprintf接受一个接口作为参数,所以编译器将你的string转换为接口来调用函数(你看到的runtime.convTstring),但显然编译器在这里有问题。

  1. 分配只在默认情况下需要,不应该放在类型切换之前,将分配推迟到需要的地方肯定会消除分配。

  2. 另一个选项是将分配提升到循环外部,这也有助于消除循环内部的分配。不过这需要编译器知道iface结构体中的指针指向一个常量。

  3. 类型切换应该被折叠,我相信这个改变会修复这个问题。

关于你的问题:

  1. str := "foobar"改为const str = "foobar"允许编译器消除分配,因为str存在于堆栈之外且是常量,所以分配可以在编译时完成。将其改为var str any = "foobar"将分配提升到循环外部,所以循环内部没有分配。

  2. go:string."foobar"(SB)是包含字符数组"foobar"的缓冲区的地址,这行代码将该地址加载到rax寄存器中,接下来一行将长度加载到rbx寄存器中。这是因为runtime.convTstring接受一个string,它由一个指针和一个整数组成。Go的调用约定将前两个原始参数分别放在raxrbx中。

英文:

The allocation is due to fmt.Sprintf taking an interface, so the compiler transforms your string into an interface to invoke the function (the runtime.convTstring you see), but apparently the compiler is dumb here.

1, The allocation is only needed in the default case, it should not be put before type switching, pushing the allocation into where it is needed will surely remove the allocation.

2, The other option is to hoist the allocation out of the loop, which also helps eliminate the allocation inside the loop. This requires the compiler to know that the pointer in the iface struct points to a constant, though.

3, The type switch should be folded, I believe this change will fix this.

For your questions:

1, Change str := &quot;foobar&quot; to const str = &quot;foobar&quot; allows the compiler to eliminate the allocation because str lives outside the stack and is constant, so the allocation can be done at compile time. And changing to var str any = &quot;foobar&quot; hoist the allocation outside of the loop, so there is no allocation inside.

2, go:string.&quot;foobar&quot;(SB) is the address of the buffer contains the char array "foobar", this line of code loads that address into rax, and the following one loads the length into rbx. This is because runtime.convTstring takes a string, which consists of a pointer and an int. The Go calling convention put the first 2 primitive arguments in rax and rbx, respectively.

答案2

得分: 1

这归结于Go语言在编译时通过逃逸分析确定某些变量是否逃逸到当前goroutine的堆栈之外,从而可以在堆栈上分配这些变量。当代码将字符串传递给fmt.Sprintf()的参数时,它会决定该字符串是否逃逸到堆栈之外。

如果字符串已经在堆栈之外,就不会进行分配。例如:

const str = "foobar"
var str any = "foobar"
str := map[string]any{
	"foobar": fmt.Sprintf("%v", time.Now().Format(time.RFC3339)),
}["foobar"]
英文:

It boils down the fact that Go can allocate things on the stack if escape analysis at compile time determines it doesn't escape the current goroutine's stack. Introducing code that passes string to fmt.Sprintf() args makes it decide the string escapes stack.

A version where the string is already outside stack don't allocate. For example:

const str = &quot;foobar&quot;
var str any = &quot;foobar&quot;
str := map[string]any{
	&quot;foobar&quot;: fmt.Sprintf(&quot;%v&quot;, time.Now().Format(time.RFC3339)),
}[&quot;foobar&quot;]

huangapple
  • 本文由 发表于 2023年4月28日 03:56:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76124171.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定