英文:
Understanding Pointer Operations & CPU/Memory usage
问题
我在工作中与一位同事讨论了传递指针给函数和/或返回指针是否更高效的问题。
我编写了一些基准函数来测试不同的方法。这些函数基本上接受一个变量,对其进行转换,并将其传回。我们有四种不同的方法:
1)正常传递变量,为转换结果创建一个新变量,并传回其副本。
2)正常传递变量,为转换结果创建一个新变量,并传回内存地址。
3)传递一个指向变量的指针,为转换结果创建一个新变量,并传回该变量的副本。
4)传递一个指向变量的指针,在指针的值上执行转换,无需传回任何内容。
我使用Go语言编写了以下代码:
package main
import (
"fmt"
"testing"
)
type MyStruct struct {
myString string
}
func acceptParamReturnVariable(s MyStruct) MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return ns
}
func acceptParamReturnPointer(s MyStruct) *MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return &ns
}
func acceptPointerParamReturnVariable(s *MyStruct) MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return ns
}
func acceptPointerParamNoReturn(s *MyStruct) {
s.myString = fmt.Sprintf("I'm quoting this: \"%s\"", s.myString)
}
func BenchmarkNormalParamReturnVariable(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns MyStruct
for i := 0; i < b.N; i++ {
ns = acceptParamReturnVariable(s)
}
_ = ns
}
func BenchmarkNormalParamReturnPointer(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns *MyStruct
for i := 0; i < b.N; i++ {
ns = acceptParamReturnPointer(s)
}
_ = ns
}
func BenchmarkPointerParamReturnVariable(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns MyStruct
for i := 0; i < b.N; i++ {
ns = acceptPointerParamReturnVariable(&s)
}
_ = ns
}
func BenchmarkPointerParamNoReturn(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
for i := 0; i < b.N; i++ {
acceptPointerParamNoReturn(&s)
}
_ = s
}
我发现结果相当令人惊讶。
$ go test -run=XXXX -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: XXXX
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkNormalParamReturnVariable-16 10538138 103.3 ns/op 48 B/op 2 allocs/op
BenchmarkNormalParamReturnPointer-16 9526380 201.2 ns/op 64 B/op 3 allocs/op
BenchmarkPointerParamReturnVariable-16 7542066 147.0 ns/op 48 B/op 2 allocs/op
BenchmarkPointerParamNoReturn-16 45897 119265 ns/op 924351 B/op 5 allocs/op
在运行之前,我认为最高效的方式应该是第四个测试,因为在调用的函数的作用域中没有创建新变量,只传递了内存地址,然而,第四个测试是最不高效的,耗时最长,并且使用的内存也最多。
有人能解释一下这个问题吗?或者给我一些解释这个问题的好的阅读链接吗?
英文:
I was talking to a colleague at work about whether or not it's more efficient to pass a pointer to a function and/or returning a pointer.
I put together some bench mark functions to test the different ways of doing this. The functions basically accept a variable, transform it and pass it back. We have 4 different ways of doing it:
- Pass the variable in normally, create a new variable for the result of the transformation and pass back a copy of it
- Pass the variable in normally, create a new variable for the result of the transformation, and pass back the memory address
- Pass in a pointer to a variable, create a new variable for the result of the transformation and pass back a copy of that variable
- Pass in a pointer to a variable, perform the transformation on the value of the pointer, nothing to pass back.
package main
import (
"fmt"
"testing"
)
type MyStruct struct {
myString string
}
func acceptParamReturnVariable(s MyStruct) MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return ns
}
func acceptParamReturnPointer(s MyStruct) *MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return &ns
}
func acceptPointerParamReturnVariable(s *MyStruct) MyStruct {
ns := MyStruct{
fmt.Sprintf("I'm quoting this: \"%s\"", s.myString),
}
return ns
}
func acceptPointerParamNoReturn(s *MyStruct) {
s.myString = fmt.Sprintf("I'm quoting this: \"%s\"", s.myString)
}
func BenchmarkNormalParamReturnVariable(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns MyStruct
for i := 0; i < b.N; i++ {
ns = acceptParamReturnVariable(s)
}
_ = ns
}
func BenchmarkNormalParamReturnPointer(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns *MyStruct
for i := 0; i < b.N; i++ {
ns = acceptParamReturnPointer(s)
}
_ = ns
}
func BenchmarkPointerParamReturnVariable(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
var ns MyStruct
for i := 0; i < b.N; i++ {
ns = acceptPointerParamReturnVariable(&s)
}
_ = ns
}
func BenchmarkPointerParamNoReturn(b *testing.B) {
s := MyStruct{
myString: "Hello World",
}
for i := 0; i < b.N; i++ {
acceptPointerParamNoReturn(&s)
}
_ = s
}
I found the results rather surprising.
$ go test -run=XXXX -bench=. -benchmem
goos: darwin
goarch: amd64
pkg: XXXX
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkNormalParamReturnVariable-16 10538138 103.3 ns/op 48 B/op 2 allocs/op
BenchmarkNormalParamReturnPointer-16 9526380 201.2 ns/op 64 B/op 3 allocs/op
BenchmarkPointerParamReturnVariable-16 7542066 147.0 ns/op 48 B/op 2 allocs/op
BenchmarkPointerParamNoReturn-16 45897 119265 ns/op 924351 B/op 5 allocs/op
Before running this, I figured the most efficient way would have been the 4th test, since no new variables are being created in the scope of the function being called and only memory addressed are being passed around, however, it seems that the 4th one is the least efficient, taking the most time, as well as using the most memory too.
Could some one possible explain this to me, or provide me with some good reading links that explain this?
答案1
得分: 1
你所做的基准测试并不能回答你所提出的问题。微基准测试被证明是非常困难的,不仅在Go语言中如此,在一般情况下也是如此。
回到效率问题上。通常情况下,将指针传递给函数不会逃逸到堆上。而通常情况下,从函数返回指针会逃逸到堆上。这里的关键词是“通常”。你无法确定编译器何时在堆栈上分配内存,何时在堆上分配内存。这不是一个简单的问题。你可以在这里找到一个非常好而简短的解释:链接。
但是如果你需要知道,你可以询问。你可以通过向go tool compile
传递-m
标志来打印编译器所做的优化决策。
go build -gcflags -m=1
如果你传递大于1的整数,你将得到更详细的输出。如果这不能给你提供优化程序所需的答案,那么可以尝试性能分析。这远远超出了内存分析的范畴。
总的来说,在日常工作中不要过于关注天真的优化决策。不要过于依赖那些说“通常……”的陈述,因为在现实世界中,你永远无法确定。首先要追求正确性优化,只有在确实需要并且已经证明需要进行性能优化时才进行。不要猜测,不要轻信。此外,要记住,Go语言是在不断变化的,我们在一个版本中证明的东西在另一个版本中可能不成立。
英文:
Benchmarks you do don't answer the questions you ask. Microbenchmarking is proven to be extremely hard - not only in Go world but in general.
Coming back to the efficiency problem. Typically, passing a pointer to a function doesn't escape to the heap. And typically, returning a pointer from a function does escape to the heap. Typically is the key word here. You can't really say when the compiler allocates something on the stack and when on the heap. This is not a trivial problem. Really good and short explanation can be found here.
But if you need to know, you can ask. You can start by simply printing optimization decisions made by the compiler. You can do so by passing the m
flag to the go tool compile
.
go build -gcflags -m=1
If you pass integer greater than 1 you get more verbose output. If it doesn't give you the answer you need to optimize your program, then try profiling. It goes much beyond the memory analysis.
In general, in your daily work do not bother with naive optimization decisions. Don't get too attached to the statements saying 'Typically...' because in real world, you never know. Always aim at the correctness optimization first. And then do the performance optimization only if you really need it and you proved that you need it. Do not guess, do not trust. Also, keep in mind, Go is changing so what we prove in one version, doesn't have to be true in the other.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论