函数调用导致性能下降

huangapple go评论82阅读模式
英文:

Performance decrease with function call

问题

对于以下函数:

func CycleClock(c *ballclock.Clock) int {
    for i := 0; i < fiveMinutesPerDay; i++ {
        c.TickFive()
    }

    return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

其中c.BallQueue被定义为[]intCalculateBallCycle被定义为func CalculateBallCycle(s []int) int。在for循环和return语句之间,我遇到了巨大的性能下降。

我编写了以下基准测试来测试性能。第一个基准测试整个函数,第二个基准测试for循环,而第三个基准测试CalculateBallCycle函数:

func BenchmarkCycleClock(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                CycleClock(c)
            }
        })
    }
}

func BenchmarkCycle24(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                for k := 0; k < fiveMinutesPerDay; k++ {
                    c.TickFive()
                }
            }
        })
    }
}

func BenchmarkCalculateBallCycle123(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}

    for n := 0; n < b.N; n++ {
        CalculateBallCycle(m)
    }
}

使用123个球,得到以下结果:

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

从这个结果来看,基准测试之间存在巨大的差异。我期望第一个基准测试的时间大约为~8000 ns/op,因为那是各个部分的总和。

这里是GitHub存储库。

编辑:

我发现基准测试的结果与运行程序的结果相差很大。我采用了@yazgazan发现的方法,并修改了main.go中的基准测试函数,使其与main_test.go中的BenchmarkCalculateBallCycle123函数有些相似:

func Benchmark() {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        if i != 123 {
            continue
        }

        start := time.Now()

        t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})

        duration := time.Since(start)

        fmt.Printf("Ballclock with %v balls took %s;\n", i, duration)
    }
}

这给出了以下输出:

Ballclock with 123 balls took 11.86748ms;

如你所见,总时间为11.86毫秒,其中全部时间都花在了CalculateBallCycle函数上。是什么导致基准测试运行时间为456 ns/op,而运行程序的时间约为11867480 ms/op

英文:

For the following function:

func CycleClock(c *ballclock.Clock) int {
for i := 0; i &lt; fiveMinutesPerDay; i++ {
c.TickFive()
}
return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.

I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:

func BenchmarkCycleClock(b *testing.B) {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
j := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(j)
CycleClock(c)
}
})
}
}
func BenchmarkCycle24(b *testing.B) {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
j := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(j)
for k := 0; k &lt; fiveMinutesPerDay; k++ {
c.TickFive()
}
}
})
}
}
func BenchmarkCalculateBallCycle123(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n &lt; b.N; n++ {
CalculateBallCycle(m)
}
}

Using 123 balls, this gives the following result:

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.

Here is the github repository.

EDIT:

I discovered that the result from the benchmark and the result from the running program are widely different. I took what @yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:

func Benchmark() {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
if i != 123 {
continue
}
start := time.Now()
t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})
duration := time.Since(start)
fmt.Printf(&quot;Ballclock with %v balls took %s;\n&quot;, i, duration)
}
}

This gave the output of:

Ballclock with 123 balls took 11.86748ms;

As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?

答案1

得分: 2

你写道CalcualteBallCycle()的设计意图是修改切片。

我无法对这种方法的正确性发表意见,但这就是为什么BenchmarkCalculateBallCycle123的基准时间如此不同。

在第一次运行时,它执行了预期的操作,但在后续运行中,它执行了完全不同的操作,因为你将不同的数据作为输入传递。

对修改后的代码进行基准测试:

func BenchmarkCalculateBallCycle123v2(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
    for n := 0; n < b.N; n++ {
        tmp := append([]int{}, m...)
        CalculateBallCycle(tmp)
    }
}

这通过创建m的副本来解决了这个问题,这样CalculateBallCycle就会修改本地副本。

运行时间变得更像其他的结果:

BenchmarkCalculateBallCycle123-8       3000000       500 ns/op
BenchmarkCalculateBallCycle123v2-8         100   10483347 ns/op
英文:

You wrote that CalcualteBallCycle() modifies the slice by design.

I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.

On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.

Benchmark this modified code:

func BenchmarkCalculateBallCycle123v2(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n &lt; b.N; n++ {
tmp := append([]int{}, m...)
CalculateBallCycle(tmp)
}
}

This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.

The running time becomes more like the others:

BenchmarkCalculateBallCycle123-8     	 3000000	       500 ns/op
BenchmarkCalculateBallCycle123v2-8   	     100	  10483347 ns/op

答案2

得分: 0

在你的CycleClock函数中,你正在复制c.BallQueue切片。你可以通过使用CalculateBallCycle(c.BallQueue)来显著提高性能(假设CalculateBallCycle不会修改切片)。

例如:

func Sum(values []int) int {
    sum := 0
    for _, v := range values {
        sum += v
    }

    return sum
}

func BenchmarkNoCopy(b *testing.B) {
    for n := 0; n < b.N; n++ {
        Sum(m)
    }
}

func BenchmarkWithCopy(b *testing.B) {
    for n := 0; n < b.N; n++ {
        Sum(append([]int{}, m...))
    }
}

// BenchmarkNoCopy-4     	20000000	        73.5 ns/op
// BenchmarkWithCopy-4   	 5000000	       306 ns/op
// PASS

以上是一个示例,展示了在性能测试中使用复制和不使用复制的差异。

英文:

In your CycleClock function, you are copying the c.BallQueue slice. You can improve performance significantly by using CalculateBallCycle(c.BallQueue) instead (assuming CalculateBallCycle doesn't modify the slice)

For example:

func Sum(values []int) int {
sum := 0
for _, v := range values {
sum += v
}
return sum
}
func BenchmarkNoCopy(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
Sum(m)
}
}
func BenchmarkWithCopy(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
Sum(append([]int{}, m...))
}
}
// BenchmarkNoCopy-4     	20000000	        73.5 ns/op
// BenchmarkWithCopy-4   	 5000000	       306 ns/op
// PASS

答案3

得分: 0

你的测试中有一个微妙的错误。

BenchmarkCycleClockBenchmarkCycle24 两个方法都在一个循环中运行基准测试,并将一个闭包传递给 b.Run。在这些闭包中,你使用循环变量 i 来初始化时钟,像这样:ballclock.NewClock(i)

问题是,所有匿名函数的实例共享同一个变量。而且,在测试运行时,循环已经结束,所有的时钟都使用相同的值进行初始化:ballclock.MaxBalls

你可以通过使用一个局部变量来解决这个问题:

for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
    i := i
    b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
        for n := 0; n < b.N; n++ {
            c, _ := ballclock.NewClock(i)

            CycleClock(c)
        }
    })
}

i := i 这一行代码存储了当前 i 的值的副本(对于每个匿名函数实例都是不同的)。

英文:

There is a subtle bug in your tests.

Both methods BenchmarkCycleClock and BenchmarkCycle24 run the benchmark in a for loop, passing a closure to b.Run. Inside of those closures you initialize the clocks using the loop variable i like this:ballclock.NewClock(i).

The problem is, that all instances of your anonymous function share the same variable. And, by the time the function is run by the test runner, the loop will be finished, and all of the clocks will be initialized using the same value: ballclock.MaxBalls.

You can fix this using a local variable:

for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
i := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(i)
CycleClock(c)
}
})
}

The line i := i stores a copy of the current value of i (different for each instance of your anonymous function).

huangapple
  • 本文由 发表于 2017年7月17日 22:50:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/45147353.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定