相同早期返回语句的函数的不同基准测试结果

huangapple go评论83阅读模式
英文:

Different Benchmark results for functions with same early return statement

问题

我已经为将输入数组向右旋转n次实现了两个函数。

两种实现方式都会在初始数组等于所有旋转后的结果数组时提前退出,即当n为0或数组长度的倍数时。

func rotateRight1(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	lastNDigits := make([]int, n)
	copy(lastNDigits, nums[len(nums)-n:])

	copy(nums[n:], nums[:len(nums)-n])
	copy(nums[:n], lastNDigits)
}

func rotateRight2(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	i := 0
	current := nums[i]
	iAlreadySeen := i
	for j := 0; j < len(nums); j++ {
		nextI := (i + n) % len(nums)

		nums[nextI], current = current, nums[nextI]

		i = nextI

		// 处理长度为偶数的数组,其中i+k可能等于已经遍历过的索引
		if nextI == iAlreadySeen {
			i = (i + 1) % len(nums)
			iAlreadySeen = i
			current = nums[i]
		}
	}
}

在进行基准测试时,当n等于0时,两个函数的速度差异让我感到惊讶,差异达到了20倍。

func BenchmarkRotateRight1(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		rotateRight1(nums, 0)
	}
}

func BenchmarkRotateRight2(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		rotateRight2(nums, 0)
	}
}

运行go test -bench=.会得到如下结果:

cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkRotateRight1-4         1000000000               0.4603 ns/op          0 B/op          0 allocs/op
BenchmarkRotateRight2-4         97236492                12.11 ns/op            0 B/op          0 allocs/op
PASS

我不理解这种性能差异,因为两个函数基本上都在做相同的事情,并且在if k == 0条件下都会提前退出。

有人可以帮我理解这个问题吗?

go version go1.18 linux/amd64

英文:

I've implemented two functions for rotating an input array n times to the right.

Both implementations exit early, without performing anything, if the initial array would be equal to the resulting array after all rotations. This happens anytime n is 0 or a multiple of the length of the array.

func rotateRight1(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	lastNDigits := make([]int, n)
	copy(lastNDigits, nums[len(nums)-n:])

	copy(nums[n:], nums[:len(nums)-n])
	copy(nums[:n], lastNDigits)
}

and

func rotateRight2(nums []int, n int) {
	n = n % len(nums)
	if n == 0 {
		return
	}

	i := 0
	current := nums[i]
	iAlreadySeen := i
	for j := 0; j &lt; len(nums); j++ {
		nextI := (i + n) % len(nums)

		nums[nextI], current = current, nums[nextI]

		i = nextI

		// handle even length arrays where i+k might equal an already seen index
		if nextI == iAlreadySeen {
			i = (i + 1) % len(nums)
			iAlreadySeen = i
			current = nums[i]
		}
	}
}

When benchmarking, I was surprised to see a +20x difference in speed when n equals 0 for both functions.

func BenchmarkRotateRight1(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i &lt; b.N; i++ {
		rotateRight1(nums, 0)
	}
}

func BenchmarkRotateRight2(b *testing.B) {
	nums := make([]int, 5_000)

	b.ResetTimer()
	b.ReportAllocs()
	for i := 0; i &lt; b.N; i++ {
		rotateRight2(nums, 0)
	}
}

go test -bench=. yields a result like this consistently:

cpu: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
BenchmarkRotateRight1-4         1000000000               0.4603 ns/op          0 B/op          0 allocs/op
BenchmarkRotateRight2-4         97236492                12.11 ns/op            0 B/op          0 allocs/op
PASS

I don't understand this performance difference as both functions are basically doing the same thing and exiting early in the if k == 0 condition.

Could someone help me understand this?

go version go1.18 linux/amd64

答案1

得分: 3

你所看到的是编译器优化的结果。编译器足够聪明,通过内联函数调用来加快应用程序的速度。有时,编译器会优化以消除函数调用,并人为降低基准测试的运行时间(这可能会有些棘手)。

在进行分析之后,我们可以注意到函数rotateRight1在基准测试执行期间甚至没有被调用:

(pprof) list BenchmarkRotateRight1
Total: 260ms
ROUTINE ======================== main_test.BenchmarkRotateRight1 in (edited)
     260ms      260ms (flat, cum)   100% of Total
         .          .     43:func BenchmarkRotateRight1(b *testing.B) {
         .          .     44:	nums := make([]int, 5_000)
         .          .     45:
         .          .     46:	b.ResetTimer()
         .          .     47:	b.ReportAllocs()
     260ms      260ms     48:	for i := 0; i &lt; b.N; i++ {
         .          .     49:		rotateRight1(nums, 0)
         .          .     50:	}
         .          .     51:}

另一方面,rotateRight2被调用了,这就是你在基准测试的运行时间中看到的差异:

(pprof) list BenchmarkRotateRight2
Total: 1.89s
ROUTINE ======================== main_test.BenchmarkRotateRight2 in (edited)
     180ms      1.89s (flat, cum)   100% of Total
         .          .     53:func BenchmarkRotateRight2(b *testing.B) {
         .          .     54:	nums := make([]int, 5_000)
         .          .     55:
         .          .     56:	b.ResetTimer()
         .          .     57:	b.ReportAllocs()
     130ms      130ms     58:	for i := 0; i &lt; b.N; i++ {
      50ms      1.76s     59:		rotateRight2(nums, 0)
         .          .     60:	}
         .          .     61:}

如果你使用-gcflags '-m -m'(双重-m-m=2)运行基准测试,你将看到编译器优化代码的一些决策(参考文档:https://pkg.go.dev/cmd/compile):

$ go test -gcflags '-m -m' -benchmem -bench . main_test.go

# command-line-arguments_test [command-line-arguments.test]
./main_test.go:7:6: can inline rotateRight1 with cost 40 as: func([]int, int) { n = n % len(nums); if n == 0 { return  }; lastNDigits := make([]int, n); copy(lastNDigits, nums[len(nums) - n:]); copy(nums[n:], nums[:len(nums) - n]); copy(nums[:n], lastNDigits) }
./main_test.go:20:6: cannot inline rotateRight2: function too complex: cost 83 exceeds budget 80
...

因此,基于复杂度阈值,编译器决定是否进行优化。

你可以在要禁用内联优化的函数之前使用//go:noinline指令。这将覆盖编译器通常的优化规则:

//go:noinline
func rotateRight1(nums []int, n int) {
...
}

现在你会注意到基准测试非常相似:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkRotateRight1
BenchmarkRotateRight1-16    	135554571	         8.886 ns/op	       0 B/op	       0 allocs/op
BenchmarkRotateRight2
BenchmarkRotateRight2-16    	143716638	         8.775 ns/op	       0 B/op	       0 allocs/op
PASS
英文:

What you see is the result of compiler optimization. The compiler is clever enough to make the applications faster by inlining function calls. Sometimes, the compiler optimizes to remove function calls and artificially lowers the run time of benchmarks (that can be tricky).

After profiling, we can notice the function rotateRight1 is not even being called during the benchmark execution:

(pprof) list BenchmarkRotateRight1
Total: 260ms
ROUTINE ======================== main_test.BenchmarkRotateRight1 in (edited)
     260ms      260ms (flat, cum)   100% of Total
         .          .     43:func BenchmarkRotateRight1(b *testing.B) {
         .          .     44:	nums := make([]int, 5_000)
         .          .     45:
         .          .     46:	b.ResetTimer()
         .          .     47:	b.ReportAllocs()
     260ms      260ms     48:	for i := 0; i &lt; b.N; i++ {
         .          .     49:		rotateRight1(nums, 0)
         .          .     50:	}
         .          .     51:}

On the other hand, rotateRight2 is being called, and that's why you see that difference in the run time of the benchmarks:

(pprof) list BenchmarkRotateRight2
Total: 1.89s
ROUTINE ======================== main_test.BenchmarkRotateRight2 in (edited)
     180ms      1.89s (flat, cum)   100% of Total
         .          .     53:func BenchmarkRotateRight2(b *testing.B) {
         .          .     54:	nums := make([]int, 5_000)
         .          .     55:
         .          .     56:	b.ResetTimer()
         .          .     57:	b.ReportAllocs()
     130ms      130ms     58:	for i := 0; i &lt; b.N; i++ {
      50ms      1.76s     59:		rotateRight2(nums, 0)
         .          .     60:	}
         .          .     61:}

If you run your benchmark with -gcflags &#39;-m -m&#39; (double -m or -m=2), you will see some of the compiler decisions to optimize the code (reference):

$ go test -gcflags &#39;-m -m&#39; -benchmem -bench . main_test.go

# command-line-arguments_test [command-line-arguments.test]
./main_test.go:7:6: can inline rotateRight1 with cost 40 as: func([]int, int) { n = n % len(nums); if n == 0 { return  }; lastNDigits := make([]int, n); copy(lastNDigits, nums[len(nums) - n:]); copy(nums[n:], nums[:len(nums) - n]); copy(nums[:n], lastNDigits) }
./main_test.go:20:6: cannot inline rotateRight2: function too complex: cost 83 exceeds budget 80
...

So, based on a complexity threshold, the compiler decides whether to optimize or not.

You can use the//go:noinline directive right before the function you want to disable inlining optimization though. That will override the compiler's usual optimization rules:

//go:noinline
func rotateRight1(nums []int, n int) {
...
}

Now you will notice the benchmarks are very similar:

cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkRotateRight1
BenchmarkRotateRight1-16    	135554571	         8.886 ns/op	       0 B/op	       0 allocs/op
BenchmarkRotateRight2
BenchmarkRotateRight2-16    	143716638	         8.775 ns/op	       0 B/op	       0 allocs/op
PASS

huangapple
  • 本文由 发表于 2022年7月6日 21:55:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/72884848.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定