2022年7月6日 21:55:54go评论83阅读模式

英文:

Different Benchmark results for functions with same early return statement

问题

我已经为将输入数组向右旋转n次实现了两个函数。

两种实现方式都会在初始数组等于所有旋转后的结果数组时提前退出，即当n为0或数组长度的倍数时。

func rotateRight1(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	lastNDigits := make([]int, n)
	copy(lastNDigits, nums[len(nums)-n:])

	copy(nums[n:], nums[:len(nums)-n])
	copy(nums[:n], lastNDigits)
}

和

func rotateRight2(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	i := 0
	current := nums[i]
	iAlreadySeen := i
	for j := 0; j < len(nums); j++ {
		nextI := (i + n) % len(nums)

		nums[nextI], current = current, nums[nextI]

		i = nextI

		// 处理长度为偶数的数组，其中i+k可能等于已经遍历过的索引
		if nextI == iAlreadySeen {
			i = (i + 1) % len(nums)
			iAlreadySeen = i
			current = nums[i]
		}
	}
}

在进行基准测试时，当n等于0时，两个函数的速度差异让我感到惊讶，差异达到了20倍。

func BenchmarkRotateRight1(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		rotateRight1(nums, 0)
	}
}

func BenchmarkRotateRight2(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		rotateRight2(nums, 0)
	}
}

运行go test -bench=.会得到如下结果：

cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkRotateRight1-4         1000000000               0.4603 ns/op          0 B/op          0 allocs/op
BenchmarkRotateRight2-4         97236492                12.11 ns/op            0 B/op          0 allocs/op
PASS

我不理解这种性能差异，因为两个函数基本上都在做相同的事情，并且在if k == 0条件下都会提前退出。

有人可以帮我理解这个问题吗？

go version go1.18 linux/amd64

英文:

I've implemented two functions for rotating an input array n times to the right.

Both implementations exit early, without performing anything, if the initial array would be equal to the resulting array after all rotations. This happens anytime n is 0 or a multiple of the length of the array.

func rotateRight1(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	lastNDigits := make([]int, n)
	copy(lastNDigits, nums[len(nums)-n:])

	copy(nums[n:], nums[:len(nums)-n])
	copy(nums[:n], lastNDigits)
}

and

func rotateRight2(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	i := 0
	current := nums[i]
	iAlreadySeen := i
	for j := 0; j &lt; len(nums); j++ {
		nextI := (i + n) % len(nums)

		nums[nextI], current = current, nums[nextI]

		i = nextI

		// handle even length arrays where i+k might equal an already seen index
		if nextI == iAlreadySeen {
			i = (i + 1) % len(nums)
			iAlreadySeen = i
			current = nums[i]
		}
	}
}

When benchmarking, I was surprised to see a +20x difference in speed when n equals 0 for both functions.

func BenchmarkRotateRight1(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i &lt; b.N; i++ {
		rotateRight1(nums, 0)
	}
}

func BenchmarkRotateRight2(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i &lt; b.N; i++ {
		rotateRight2(nums, 0)
	}
}

go test -bench=. yields a result like this consistently:

cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkRotateRight1-4         1000000000               0.4603 ns/op          0 B/op          0 allocs/op
BenchmarkRotateRight2-4         97236492                12.11 ns/op            0 B/op          0 allocs/op
PASS

I don't understand this performance difference as both functions are basically doing the same thing and exiting early in the if k == 0 condition.

Could someone help me understand this?

go version go1.18 linux/amd64

答案1

得分: 3

你所看到的是编译器优化的结果。编译器足够聪明，通过内联函数调用来加快应用程序的速度。有时，编译器会优化以消除函数调用，并人为降低基准测试的运行时间（这可能会有些棘手）。

在进行分析之后，我们可以注意到函数rotateRight1在基准测试执行期间甚至没有被调用：

(pprof) list BenchmarkRotateRight1
Total: 260ms
ROUTINE ======================== main_test.BenchmarkRotateRight1 in (edited)
     260ms      260ms (flat, cum)   100% of Total
         .          .     43:func BenchmarkRotateRight1(b *testing.B) {
         .          .     44:	nums := make([]int, 5_000)
         .          .     45:
         .          .     46:	b.ResetTimer()
         .          .     47:	b.ReportAllocs()
     260ms      260ms     48:	for i := 0; i &lt; b.N; i++ {
         .          .     49:		rotateRight1(nums, 0)
         .          .     50:	}
         .          .     51:}

另一方面，rotateRight2被调用了，这就是你在基准测试的运行时间中看到的差异：

(pprof) list BenchmarkRotateRight2
Total: 1.89s
ROUTINE ======================== main_test.BenchmarkRotateRight2 in (edited)
     180ms      1.89s (flat, cum)   100% of Total
         .          .     53:func BenchmarkRotateRight2(b *testing.B) {
         .          .     54:	nums := make([]int, 5_000)
         .          .     55:
         .          .     56:	b.ResetTimer()
         .          .     57:	b.ReportAllocs()
     130ms      130ms     58:	for i := 0; i &lt; b.N; i++ {
      50ms      1.76s     59:		rotateRight2(nums, 0)
         .          .     60:	}
         .          .     61:}

如果你使用-gcflags '-m -m'（双重-m或-m=2）运行基准测试，你将看到编译器优化代码的一些决策（参考文档：https://pkg.go.dev/cmd/compile）：

$ go test -gcflags '-m -m' -benchmem -bench . main_test.go

# command-line-arguments_test [command-line-arguments.test]
./main_test.go:7:6: can inline rotateRight1 with cost 40 as: func([]int, int) { n = n % len(nums); if n == 0 { return  }; lastNDigits := make([]int, n); copy(lastNDigits, nums[len(nums) - n:]); copy(nums[n:], nums[:len(nums) - n]); copy(nums[:n], lastNDigits) }
./main_test.go:20:6: cannot inline rotateRight2: function too complex: cost 83 exceeds budget 80
...

因此，基于复杂度阈值，编译器决定是否进行优化。

你可以在要禁用内联优化的函数之前使用//go:noinline指令。这将覆盖编译器通常的优化规则：

//go:noinline
func rotateRight1(nums []int, n int) {
...
}

现在你会注意到基准测试非常相似：

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkRotateRight1
BenchmarkRotateRight1-16    	135554571	         8.886 ns/op	       0 B/op	       0 allocs/op
BenchmarkRotateRight2
BenchmarkRotateRight2-16    	143716638	         8.775 ns/op	       0 B/op	       0 allocs/op
PASS

英文:

What you see is the result of compiler optimization. The compiler is clever enough to make the applications faster by inlining function calls. Sometimes, the compiler optimizes to remove function calls and artificially lowers the run time of benchmarks (that can be tricky).

After profiling, we can notice the function rotateRight1 is not even being called during the benchmark execution:

(pprof) list BenchmarkRotateRight1
Total: 260ms
ROUTINE ======================== main_test.BenchmarkRotateRight1 in (edited)
     260ms      260ms (flat, cum)   100% of Total
         .          .     43:func BenchmarkRotateRight1(b *testing.B) {
         .          .     44:	nums := make([]int, 5_000)
         .          .     45:
         .          .     46:	b.ResetTimer()
         .          .     47:	b.ReportAllocs()
     260ms      260ms     48:	for i := 0; i &lt; b.N; i++ {
         .          .     49:		rotateRight1(nums, 0)
         .          .     50:	}
         .          .     51:}

On the other hand, rotateRight2 is being called, and that's why you see that difference in the run time of the benchmarks:

(pprof) list BenchmarkRotateRight2
Total: 1.89s
ROUTINE ======================== main_test.BenchmarkRotateRight2 in (edited)
     180ms      1.89s (flat, cum)   100% of Total
         .          .     53:func BenchmarkRotateRight2(b *testing.B) {
         .          .     54:	nums := make([]int, 5_000)
         .          .     55:
         .          .     56:	b.ResetTimer()
         .          .     57:	b.ReportAllocs()
     130ms      130ms     58:	for i := 0; i &lt; b.N; i++ {
      50ms      1.76s     59:		rotateRight2(nums, 0)
         .          .     60:	}
         .          .     61:}

If you run your benchmark with -gcflags '-m -m' (double -m or -m=2), you will see some of the compiler decisions to optimize the code (reference):

$ go test -gcflags &#39;-m -m&#39; -benchmem -bench . main_test.go

# command-line-arguments_test [command-line-arguments.test]
./main_test.go:7:6: can inline rotateRight1 with cost 40 as: func([]int, int) { n = n % len(nums); if n == 0 { return  }; lastNDigits := make([]int, n); copy(lastNDigits, nums[len(nums) - n:]); copy(nums[n:], nums[:len(nums) - n]); copy(nums[:n], lastNDigits) }
./main_test.go:20:6: cannot inline rotateRight2: function too complex: cost 83 exceeds budget 80
...

So, based on a complexity threshold, the compiler decides whether to optimize or not.

You can use the//go:noinline directive right before the function you want to disable inlining optimization though. That will override the compiler's usual optimization rules:

//go:noinline
func rotateRight1(nums []int, n int) {
...
}

Now you will notice the benchmarks are very similar:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkRotateRight1
BenchmarkRotateRight1-16    	135554571	         8.886 ns/op	       0 B/op	       0 allocs/op
BenchmarkRotateRight2
BenchmarkRotateRight2-16    	143716638	         8.775 ns/op	       0 B/op	       0 allocs/op
PASS

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

相同早期返回语句的函数的不同基准测试结果

问题

答案1

为什么http.Client{}前面有一个&符号？

SIGTRAP：在运行go test时，Golang包装C库时发生的跟踪陷阱错误。

What is the meaning of "ConnectEx tcp: The semaphore timeout period has expired."

mgo在多次session.Copy()之后存在连接泄漏。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论