如何对修改输入的函数进行基准测试?

huangapple go评论93阅读模式
英文:

How do I benchmark a function that modifies its input?

问题

当我对修改输入的函数进行基准测试时,我必须为每个基准测试循环复制测试数据,并在复制数据时暂停计时器。这意味着如果我运行go test -bench MyTest -benchtime 1s,测试可能需要2分钟而不是1秒。

我是不是做错了什么,还是我只能接受这个情况?


更多背景信息:

我正在编写一个用于读取syslog日志的程序。我的日志记录范例的一部分是,日志消息的第一行包含可读的文本,后续行包含“额外信息”,例如堆栈跟踪。因此,我的日志读取器(除其他功能外)会在第一个换行符处拆分消息,该换行符由rsyslog转义为#012

以下是该功能的代码:

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    var prev rune

    for i, char := range *line {
        if prev == 0 && char == '#' {
            prev = char
            continue
        }

        if prev == '#' && char == '0' {
            prev = char
            continue
        }

        if prev == '0' && char == '1' {
            prev = char
            continue
        }

        if prev == '1' && char == '2' {
            extra := (*line)[i+1:]
            *line = (*line)[0 : i-3]

            return extra
        }

        prev = 0
    }

    return ""
}

它最初使用了strings.Split并返回新的字符串,但是CPU分析显示它太慢了。

以下是基准测试函数:

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        msg := string([]byte(testMessage))
        b.StartTimer()

        splitMessageExtra(&msg)
    }
}

以下是不暂停计时器运行的结果:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8    3000000    434 ns/op
PASS
ok    github.com/Hubro/logreader    1.730s

以下是使用上述完整基准测试函数运行的结果:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8    5000000    385 ns/op
PASS
ok    github.com/Hubro/logreader    100.563s

注意,运行时间非常长。

英文:

When I'm benchmarking a function that modifies its input, I have to copy the test data for each loop of the benchmark, and pause the timer while I'm doing so. This can mean that if I run go test -bench MyTest -benchtime 1s the test can take 2 full minutes rather than 1 second.

Am I doing something wrong or will I just have to live with this?


More context:

I'm writing a program for reading syslog logs. Part of my logging paradigm is that the first line of a logged message contains readable text, and following lines contain "extra information", like a stack trace. My log reader therefore (among other things) splits the message on the first line break, which is escaped to #012 by rsyslog.

Here is the code for that:

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	var prev rune

	for i, char := range *line {
		if prev == 0 &amp;&amp; char == &#39;#&#39; {
			prev = char
			continue
		}

		if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
			prev = char
			continue
		}

		if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
			prev = char
			continue
		}

		if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]

			return extra
		}

		prev = 0
	}

	return &quot;&quot;
}

It originally used strings.Split and returned new strings, but cpu profiling showed that it was way too slow.

Here is the benchmark function:

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()

		splitMessageExtra(&amp;msg)
	}
}

Here's a run without pausing the timer:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 3000000 434 ns/op
PASS
ok github.com/Hubro/logreader 1.730s
</pre>

And here's a run with the exact benchmark function above:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 5000000 385 ns/op
PASS
ok github.com/Hubro/logreader 100.563s
</pre>

Notice it takes AGES to run.

答案1

得分: 1

你的代码和基准测试确实很慢。这里有一个更快的版本。

package main

import (
	"strings"
	"testing"
)

// 将主行与额外信息分开
func splitMessageExtra(line *string) string {
	const newline = "#012"
	i := strings.Index(*line, newline)
	if i < 0 {
		return ""
	}
	extra := (*line)[i+len(newline):]
	*line = (*line)[0:i]
	return extra
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i < b.N; i++ {
		msg := testMessage
		splitMessageExtra(&msg)
	}
}

输出结果:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	50000000	        32.2 ns/op
PASS
ok  	extra	1.647s

为了比较,这里是你的代码和基准测试的结果。你的代码和基准测试比我的代码和基准测试慢:968 ns/op 和 50.184s 对比 32.2 ns/op 和 1.647s。

package main

import (
	"testing"
)

// 将主行与额外信息分开
func splitMessageExtra(line *string) string {
	var prev rune
	for i, char := range *line {
		if prev == 0 && char == '#' {
			prev = char
			continue
		}
		if prev == '#' && char == '0' {
			prev = char
			continue
		}
		if prev == '0' && char == '1' {
			prev = char
			continue
		}
		if prev == '1' && char == '2' {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]

			return extra
		}
		prev = 0
	}
	return ""
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i < b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()
		splitMessageExtra(&msg)
	}
}

输出结果:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4    2000000	       968 ns/op	
PASS
ok  	extra	50.184s

你的一些代码是不必要的,它们使用了CPU时间并触发了内存分配。例如,将UTF-8字节转换为符文,for i, char := range *line {},以及将string转换为[]byte再转换回stringstring([]byte(testMessage))。一些算法可以改进,例如搜索换行符。

英文:

Your code and benchmark do seem slow. Here's a faster version.

package main

import (
	&quot;strings&quot;
	&quot;testing&quot;
)

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	const newline = &quot;#012&quot;
	i := strings.Index(*line, newline)
	if i &lt; 0 {
		return &quot;&quot;
	}
	extra := (*line)[i+len(newline):]
	*line = (*line)[0:i]
	return extra
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		msg := testMessage
		splitMessageExtra(&amp;msg)
	}
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	50000000	        32.2 ns/op
PASS
ok  	extra	1.647s

For comparison, here are the results from your code and benchmark. Your code and benchmark are slower than mine: 968 ns/op and 50.184s versus 32.2 ns/op and 1.647s respectively.

package main

import (
	&quot;testing&quot;
)

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	var prev rune
	for i, char := range *line {
		if prev == 0 &amp;&amp; char == &#39;#&#39; {
			prev = char
			continue
		}
		if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
			prev = char
			continue
		}
		if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
			prev = char
			continue
		}
		if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]

			return extra
		}
		prev = 0
	}
	return &quot;&quot;
}

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()
		splitMessageExtra(&amp;msg)
	}
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	 2000000	       968 ns/op	
PASS
ok  	extra	50.184s

Some of your code is unnecessary; it uses CPU time and triggers allocations. For example, converting to utf-8 bytes to runes,for i, char := range *line {}, and converting string to []byte to string, string([]byte(testMessage)) . Some algorithms could be improved. For example, searching for a newline.

答案2

得分: 0

这里没有什么问题。StopTimerStartTimer的开销应该比splitMessageExtra大得多。它们都调用了runtime.ReadMemStats。请参考这里

英文:

There is nothing wrong. StopTimer and StartTimer should be much more expensive than splitMessageExtra. Both of them call runtime.ReadMemStats. See here.

huangapple
  • 本文由 发表于 2017年2月26日 08:06:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/42463174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定