2017年7月17日 22:50:14go评论87阅读模式

英文:

Performance decrease with function call

问题

对于以下函数：

func CycleClock(c *ballclock.Clock) int {
    for i := 0; i < fiveMinutesPerDay; i++ {
        c.TickFive()
    }

    return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

其中c.BallQueue被定义为[]int，CalculateBallCycle被定义为func CalculateBallCycle(s []int) int。在for循环和return语句之间，我遇到了巨大的性能下降。

我编写了以下基准测试来测试性能。第一个基准测试整个函数，第二个基准测试for循环，而第三个基准测试CalculateBallCycle函数：

func BenchmarkCycleClock(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                CycleClock(c)
            }
        })
    }
}

func BenchmarkCycle24(b *testing.B) {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        j := i
        b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
            for n := 0; n < b.N; n++ {
                c, _ := ballclock.NewClock(j)

                for k := 0; k < fiveMinutesPerDay; k++ {
                    c.TickFive()
                }
            }
        })
    }
}

func BenchmarkCalculateBallCycle123(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}

    for n := 0; n < b.N; n++ {
        CalculateBallCycle(m)
    }
}

使用123个球，得到以下结果：

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

从这个结果来看，基准测试之间存在巨大的差异。我期望第一个基准测试的时间大约为~8000 ns/op，因为那是各个部分的总和。

这里是GitHub存储库。

编辑：

我发现基准测试的结果与运行程序的结果相差很大。我采用了@yazgazan发现的方法，并修改了main.go中的基准测试函数，使其与main_test.go中的BenchmarkCalculateBallCycle123函数有些相似：

func Benchmark() {
    for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
        if i != 123 {
            continue
        }

        start := time.Now()

        t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})

        duration := time.Since(start)

        fmt.Printf("Ballclock with %v balls took %s;\n", i, duration)
    }
}

这给出了以下输出：

Ballclock with 123 balls took 11.86748ms;

如你所见，总时间为11.86毫秒，其中全部时间都花在了CalculateBallCycle函数上。是什么导致基准测试运行时间为456 ns/op，而运行程序的时间约为11867480 ms/op？

英文:

For the following function:

func CycleClock(c *ballclock.Clock) int {
for i := 0; i &lt; fiveMinutesPerDay; i++ {
c.TickFive()
}
return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}

where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.

I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:

func BenchmarkCycleClock(b *testing.B) {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
j := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(j)
CycleClock(c)
}
})
}
}
func BenchmarkCycle24(b *testing.B) {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
j := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(j)
for k := 0; k &lt; fiveMinutesPerDay; k++ {
c.TickFive()
}
}
})
}
}
func BenchmarkCalculateBallCycle123(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n &lt; b.N; n++ {
CalculateBallCycle(m)
}
}

Using 123 balls, this gives the following result:

BenchmarkCycleClock/BallCount=123-8                  200           9254136 ns/op
BenchmarkCycle24/BallCount=123-8                  200000              7610 ns/op
BenchmarkCalculateBallCycle123-8                 3000000               456 ns/op

Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.

Here is the github repository.

EDIT:

I discovered that the result from the benchmark and the result from the running program are widely different. I took what @yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:

func Benchmark() {
for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
if i != 123 {
continue
}
start := time.Now()
t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})
duration := time.Since(start)
fmt.Printf(&quot;Ballclock with %v balls took %s;\n&quot;, i, duration)
}
}

This gave the output of:

Ballclock with 123 balls took 11.86748ms;

As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?

答案1

得分: 2

你写道CalcualteBallCycle()的设计意图是修改切片。

我无法对这种方法的正确性发表意见，但这就是为什么BenchmarkCalculateBallCycle123的基准时间如此不同。

在第一次运行时，它执行了预期的操作，但在后续运行中，它执行了完全不同的操作，因为你将不同的数据作为输入传递。

对修改后的代码进行基准测试：

func BenchmarkCalculateBallCycle123v2(b *testing.B) {
    m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
    for n := 0; n < b.N; n++ {
        tmp := append([]int{}, m...)
        CalculateBallCycle(tmp)
    }
}

这通过创建m的副本来解决了这个问题，这样CalculateBallCycle就会修改本地副本。

运行时间变得更像其他的结果：

BenchmarkCalculateBallCycle123-8       3000000       500 ns/op
BenchmarkCalculateBallCycle123v2-8         100   10483347 ns/op

英文:

You wrote that CalcualteBallCycle() modifies the slice by design.

I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.

On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.

Benchmark this modified code:

func BenchmarkCalculateBallCycle123v2(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n &lt; b.N; n++ {
tmp := append([]int{}, m...)
CalculateBallCycle(tmp)
}
}

This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.

The running time becomes more like the others:

BenchmarkCalculateBallCycle123-8     	 3000000	       500 ns/op
BenchmarkCalculateBallCycle123v2-8   	     100	  10483347 ns/op

答案2

得分: 0

在你的CycleClock函数中，你正在复制c.BallQueue切片。你可以通过使用CalculateBallCycle(c.BallQueue)来显著提高性能（假设CalculateBallCycle不会修改切片）。

例如：

func Sum(values []int) int {
    sum := 0
    for _, v := range values {
        sum += v
    }

    return sum
}

func BenchmarkNoCopy(b *testing.B) {
    for n := 0; n < b.N; n++ {
        Sum(m)
    }
}

func BenchmarkWithCopy(b *testing.B) {
    for n := 0; n < b.N; n++ {
        Sum(append([]int{}, m...))
    }
}

// BenchmarkNoCopy-4     	20000000	        73.5 ns/op
// BenchmarkWithCopy-4   	 5000000	       306 ns/op
// PASS

以上是一个示例，展示了在性能测试中使用复制和不使用复制的差异。

英文:

In your CycleClock function, you are copying the c.BallQueue slice. You can improve performance significantly by using CalculateBallCycle(c.BallQueue) instead (assuming CalculateBallCycle doesn't modify the slice)

For example:

func Sum(values []int) int {
sum := 0
for _, v := range values {
sum += v
}
return sum
}
func BenchmarkNoCopy(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
Sum(m)
}
}
func BenchmarkWithCopy(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
Sum(append([]int{}, m...))
}
}
// BenchmarkNoCopy-4     	20000000	        73.5 ns/op
// BenchmarkWithCopy-4   	 5000000	       306 ns/op
// PASS

答案3

得分: 0

你的测试中有一个微妙的错误。

BenchmarkCycleClock 和 BenchmarkCycle24 两个方法都在一个循环中运行基准测试，并将一个闭包传递给 b.Run。在这些闭包中，你使用循环变量 i 来初始化时钟，像这样：ballclock.NewClock(i)。

问题是，所有匿名函数的实例共享同一个变量。而且，在测试运行时，循环已经结束，所有的时钟都使用相同的值进行初始化：ballclock.MaxBalls。

你可以通过使用一个局部变量来解决这个问题：

for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
    i := i
    b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
        for n := 0; n < b.N; n++ {
            c, _ := ballclock.NewClock(i)

            CycleClock(c)
        }
    })
}

i := i 这一行代码存储了当前 i 的值的副本（对于每个匿名函数实例都是不同的）。

英文:

There is a subtle bug in your tests.

Both methods BenchmarkCycleClock and BenchmarkCycle24 run the benchmark in a for loop, passing a closure to b.Run. Inside of those closures you initialize the clocks using the loop variable i like this:ballclock.NewClock(i).

The problem is, that all instances of your anonymous function share the same variable. And, by the time the function is run by the test runner, the loop will be finished, and all of the clocks will be initialized using the same value: ballclock.MaxBalls.

You can fix this using a local variable:

for i := ballclock.MinBalls; i &lt;= ballclock.MaxBalls; i++ {
i := i
b.Run(&quot;BallCount=&quot;+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n &lt; b.N; n++ {
c, _ := ballclock.NewClock(i)
CycleClock(c)
}
})
}

The line i := i stores a copy of the current value of i (different for each instance of your anonymous function).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

函数调用导致性能下降

问题

答案1

答案2

答案3

使用jsonapi将切片进行编组

在Golang中，如何使用套接字绑定到特定的地址/端口来接收组播数据包？

在cron作业中使用*gin.Context

当在goroutine内部进行I/O操作时，Go语言会阻塞当前线程吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论