2022年3月17日 13:01:22go评论88阅读模式

英文:

Why does my benchmark show same fast performance for ranging over a slice by value vs. index?

问题

type Item struct {
    A int
    B [1024]byte
}

func BenchmarkRange1(b *testing.B) {
    s := make([]Item, 1024)
    for i := 0; i < b.N; i++ {
        for _, v := range s {
            _ = v.A
        }
    }
}

func BenchmarkRange2(b *testing.B) {
    s := make([]Item, 1024)
    for i := 0; i < b.N; i++ {
        for i := range s {
            _ = s[i].A
        }
    }
}

现在，让我们看一下基准测试的结果。

go test -bench=BenchmarkRange -benchmem main/copy
goos: darwin
goarch: amd64
pkg: main/copy
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkRange1-12       4577601               260.9 ns/op             0 B/op          0 allocs/op
BenchmarkRange2-12       4697178               254.9 ns/op             0 B/op          0 allocs/op
PASS
ok      main/copy       3.391s

在使用 range 遍历切片时，不是会复制元素吗？为什么性能相同？当我们通过值来遍历切片时，编译器做了什么优化？

当我使用编译选项 "-gcflags=-N" 禁用编译器优化时，我会得到预期的结果：

go test -bench=BenchmarkRange -benchmem -gcflags=-N main/copy
goos: darwin
goarch: amd64
pkg: main/copy
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkRange1-12         39004             29481 ns/op              27 B/op          0 allocs/op
BenchmarkRange2-12        777356              1572 ns/op               1 B/op          0 allocs/op
PASS
ok      main/copy       3.169s

谁能解释一下编译器是如何进行优化的？

英文:

type Item struct {
    A int
    B [1024]byte
}
 
func BenchmarkRange1(b *testing.B) {
    s := make([]Item, 1024)
    for i := 0; i &lt; b.N; i++ {
        for _, v := range s {
            _ = v.A
        }
    }
}
 
func BenchmarkRange2(b *testing.B) {
    s := make([]Item, 1024)
    for i := 0; i &lt; b.N; i++ {
        for i := range s {
            _ = s[i].A
        }
    }
}

Now, take a look at the result of the benchmark.

go test -bench=BenchmarkRange -benchmem main/copy
goos: darwin
goarch: amd64
pkg: main/copy
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkRange1-12       4577601               260.9 ns/op             0 B/op          0 allocs/op
BenchmarkRange2-12       4697178               254.9 ns/op             0 B/op          0 allocs/op
PASS
ok      main/copy       3.391s

Isn't it to copy elements when range slice by value? Why the performance is same? What optimization does the compiler do when we range the slice by value？

When I fobidden the optimization of compiler by compiling option "-gcflags=-N", I will get the expected result:

go test -bench=BenchmarkRange -benchmem -gcflags=-N main/copy
goos: darwin
goarch: amd64
pkg: main/copy
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
BenchmarkRange1-12         39004             29481 ns/op              27 B/op          0 allocs/op
BenchmarkRange2-12        777356              1572 ns/op               1 B/op          0 allocs/op
PASS
ok      main/copy       3.169s

Who can explain how the compiler to optimize.

答案1

得分: 1

使用默认优化，你在BenchmarkRange1和BenchmarkRange2的内部循环中的代码被编译成一个空循环，循环次数为1024次，就像你写的内部循环一样：

	for i := 0; i < 1024; i++ {

	}

在你的两个示例中，编译器足够聪明，能够识别出你在内部循环中没有做任何事情（也就是说，没有使用v、v.A、s[i]或s[i].A）。

go.godbolt.org是一个很好的资源，可以查看Go编译器生成的汇编代码。例如，BenchmarkRange1中的内部循环被编译成以下代码（将AX寄存器清零，然后循环1024次）：

        XORL    AX, AX
Range1_pc39:
        INCQ    AX
        CMPQ    AX, $1024
        JLT     Range1_pc39

你可以在这里查看完整的输出，还有方便的工具提示（通常解释了不同的汇编指令）：
https://go.godbolt.org/z/raTPjTrYG

（为了让示例更简洁，我省略了testing包；//go:nosplit注释实际上并不需要，但稍微简化了生成的汇编代码）。

英文:

With the default optimizations, your inner loop in both both BenchmarkRange1 and BenchmarkRange2 is being compiled down to an empty loop with 1024 iterations, as if you had written your inner loop like:

	for i := 0; i &lt; 1024; i++ {

	}

In both of your examples, the compiler is smart enough to recognize that you aren't doing anything inside the inner loop (that is, not making use of v, v.A, s[i], or s[i].A).

go.godbolt.org is a great resource for looking at the assembly the Go compiler produces. For example, the inner loop in BenchmarkRange1 gets compiled down to the following (which zeros out AX, then loops 1024 times):

        XORL    AX, AX
Range1_pc39:
        INCQ    AX
        CMPQ    AX, $1024
        JLT     Range1_pc39

You can look at the complete output here, along with handy tooltips that (usually) explain the different assembly instructions:
https://go.godbolt.org/z/raTPjTrYG

(To make your example shorter, I dropped the testing package; the //go:nosplit comments aren't really needed, but slightly simplify the resulting assembly).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么我的基准测试显示通过值和索引对切片进行范围遍历时速度相同？

问题

答案1

How to find by id in golang and mongodb

What does the underscore(_) do in for loop Golang?

Go / Node.js / PHP + NGINX / Apache 网站根目录约定 / 最佳实践

Go f(…) 与 f(go func(){…}()) 的区别是什么？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论