如何对修改输入的函数进行基准测试?

huangapple go评论116阅读模式
英文:

How do I benchmark a function that modifies its input?

问题

当我对修改输入的函数进行基准测试时,我必须为每个基准测试循环复制测试数据,并在复制数据时暂停计时器。这意味着如果我运行go test -bench MyTest -benchtime 1s,测试可能需要2分钟而不是1秒。

我是不是做错了什么,还是我只能接受这个情况?


更多背景信息:

我正在编写一个用于读取syslog日志的程序。我的日志记录范例的一部分是,日志消息的第一行包含可读的文本,后续行包含“额外信息”,例如堆栈跟踪。因此,我的日志读取器(除其他功能外)会在第一个换行符处拆分消息,该换行符由rsyslog转义为#012

以下是该功能的代码:

  1. // Splits the main line from extra information
  2. func splitMessageExtra(line *string) string {
  3. var prev rune
  4. for i, char := range *line {
  5. if prev == 0 && char == '#' {
  6. prev = char
  7. continue
  8. }
  9. if prev == '#' && char == '0' {
  10. prev = char
  11. continue
  12. }
  13. if prev == '0' && char == '1' {
  14. prev = char
  15. continue
  16. }
  17. if prev == '1' && char == '2' {
  18. extra := (*line)[i+1:]
  19. *line = (*line)[0 : i-3]
  20. return extra
  21. }
  22. prev = 0
  23. }
  24. return ""
  25. }

它最初使用了strings.Split并返回新的字符串,但是CPU分析显示它太慢了。

以下是基准测试函数:

  1. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
  2. func BenchmarkSplitMessageExtra(b *testing.B) {
  3. for i := 0; i < b.N; i++ {
  4. b.StopTimer()
  5. msg := string([]byte(testMessage))
  6. b.StartTimer()
  7. splitMessageExtra(&msg)
  8. }
  9. }

以下是不暂停计时器运行的结果:

  1. $ go test -bench SplitMessageExtra -benchtime 1s
  2. BenchmarkSplitMessageExtra-8 3000000 434 ns/op
  3. PASS
  4. ok github.com/Hubro/logreader 1.730s

以下是使用上述完整基准测试函数运行的结果:

  1. $ go test -bench SplitMessageExtra -benchtime 1s
  2. BenchmarkSplitMessageExtra-8 5000000 385 ns/op
  3. PASS
  4. ok github.com/Hubro/logreader 100.563s

注意,运行时间非常长。

英文:

When I'm benchmarking a function that modifies its input, I have to copy the test data for each loop of the benchmark, and pause the timer while I'm doing so. This can mean that if I run go test -bench MyTest -benchtime 1s the test can take 2 full minutes rather than 1 second.

Am I doing something wrong or will I just have to live with this?


More context:

I'm writing a program for reading syslog logs. Part of my logging paradigm is that the first line of a logged message contains readable text, and following lines contain "extra information", like a stack trace. My log reader therefore (among other things) splits the message on the first line break, which is escaped to #012 by rsyslog.

Here is the code for that:

  1. // Splits the main line from extra information
  2. func splitMessageExtra(line *string) string {
  3. var prev rune
  4. for i, char := range *line {
  5. if prev == 0 &amp;&amp; char == &#39;#&#39; {
  6. prev = char
  7. continue
  8. }
  9. if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
  10. prev = char
  11. continue
  12. }
  13. if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
  14. prev = char
  15. continue
  16. }
  17. if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
  18. extra := (*line)[i+1:]
  19. *line = (*line)[0 : i-3]
  20. return extra
  21. }
  22. prev = 0
  23. }
  24. return &quot;&quot;
  25. }

It originally used strings.Split and returned new strings, but cpu profiling showed that it was way too slow.

Here is the benchmark function:

  1. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
  2. func BenchmarkSplitMessageExtra(b *testing.B) {
  3. for i := 0; i &lt; b.N; i++ {
  4. b.StopTimer()
  5. msg := string([]byte(testMessage))
  6. b.StartTimer()
  7. splitMessageExtra(&amp;msg)
  8. }
  9. }

Here's a run without pausing the timer:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 3000000 434 ns/op
PASS
ok github.com/Hubro/logreader 1.730s
</pre>

And here's a run with the exact benchmark function above:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 5000000 385 ns/op
PASS
ok github.com/Hubro/logreader 100.563s
</pre>

Notice it takes AGES to run.

答案1

得分: 1

你的代码和基准测试确实很慢。这里有一个更快的版本。

  1. package main
  2. import (
  3. "strings"
  4. "testing"
  5. )
  6. // 将主行与额外信息分开
  7. func splitMessageExtra(line *string) string {
  8. const newline = "#012"
  9. i := strings.Index(*line, newline)
  10. if i < 0 {
  11. return ""
  12. }
  13. extra := (*line)[i+len(newline):]
  14. *line = (*line)[0:i]
  15. return extra
  16. }
  17. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
  18. func BenchmarkSplitMessageExtra(b *testing.B) {
  19. for i := 0; i < b.N; i++ {
  20. msg := testMessage
  21. splitMessageExtra(&msg)
  22. }
  23. }

输出结果:

  1. $ go test -bench=.
  2. goos: linux
  3. goarch: amd64
  4. pkg: extra
  5. BenchmarkSplitMessageExtra-4 50000000 32.2 ns/op
  6. PASS
  7. ok extra 1.647s

为了比较,这里是你的代码和基准测试的结果。你的代码和基准测试比我的代码和基准测试慢:968 ns/op 和 50.184s 对比 32.2 ns/op 和 1.647s。

  1. package main
  2. import (
  3. "testing"
  4. )
  5. // 将主行与额外信息分开
  6. func splitMessageExtra(line *string) string {
  7. var prev rune
  8. for i, char := range *line {
  9. if prev == 0 && char == '#' {
  10. prev = char
  11. continue
  12. }
  13. if prev == '#' && char == '0' {
  14. prev = char
  15. continue
  16. }
  17. if prev == '0' && char == '1' {
  18. prev = char
  19. continue
  20. }
  21. if prev == '1' && char == '2' {
  22. extra := (*line)[i+1:]
  23. *line = (*line)[0 : i-3]
  24. return extra
  25. }
  26. prev = 0
  27. }
  28. return ""
  29. }
  30. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
  31. func BenchmarkSplitMessageExtra(b *testing.B) {
  32. for i := 0; i < b.N; i++ {
  33. b.StopTimer()
  34. msg := string([]byte(testMessage))
  35. b.StartTimer()
  36. splitMessageExtra(&msg)
  37. }
  38. }

输出结果:

  1. $ go test -bench=.
  2. goos: linux
  3. goarch: amd64
  4. pkg: extra
  5. BenchmarkSplitMessageExtra-4 2000000 968 ns/op
  6. PASS
  7. ok extra 50.184s

你的一些代码是不必要的,它们使用了CPU时间并触发了内存分配。例如,将UTF-8字节转换为符文,for i, char := range *line {},以及将string转换为[]byte再转换回stringstring([]byte(testMessage))。一些算法可以改进,例如搜索换行符。

英文:

Your code and benchmark do seem slow. Here's a faster version.

  1. package main
  2. import (
  3. &quot;strings&quot;
  4. &quot;testing&quot;
  5. )
  6. // Splits the main line from extra information
  7. func splitMessageExtra(line *string) string {
  8. const newline = &quot;#012&quot;
  9. i := strings.Index(*line, newline)
  10. if i &lt; 0 {
  11. return &quot;&quot;
  12. }
  13. extra := (*line)[i+len(newline):]
  14. *line = (*line)[0:i]
  15. return extra
  16. }
  17. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
  18. func BenchmarkSplitMessageExtra(b *testing.B) {
  19. for i := 0; i &lt; b.N; i++ {
  20. msg := testMessage
  21. splitMessageExtra(&amp;msg)
  22. }
  23. }

Output:

  1. $ go test -bench=.
  2. goos: linux
  3. goarch: amd64
  4. pkg: extra
  5. BenchmarkSplitMessageExtra-4 50000000 32.2 ns/op
  6. PASS
  7. ok extra 1.647s

For comparison, here are the results from your code and benchmark. Your code and benchmark are slower than mine: 968 ns/op and 50.184s versus 32.2 ns/op and 1.647s respectively.

  1. package main
  2. import (
  3. &quot;testing&quot;
  4. )
  5. // Splits the main line from extra information
  6. func splitMessageExtra(line *string) string {
  7. var prev rune
  8. for i, char := range *line {
  9. if prev == 0 &amp;&amp; char == &#39;#&#39; {
  10. prev = char
  11. continue
  12. }
  13. if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
  14. prev = char
  15. continue
  16. }
  17. if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
  18. prev = char
  19. continue
  20. }
  21. if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
  22. extra := (*line)[i+1:]
  23. *line = (*line)[0 : i-3]
  24. return extra
  25. }
  26. prev = 0
  27. }
  28. return &quot;&quot;
  29. }
  30. var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
  31. func BenchmarkSplitMessageExtra(b *testing.B) {
  32. for i := 0; i &lt; b.N; i++ {
  33. b.StopTimer()
  34. msg := string([]byte(testMessage))
  35. b.StartTimer()
  36. splitMessageExtra(&amp;msg)
  37. }
  38. }

Output:

  1. $ go test -bench=.
  2. goos: linux
  3. goarch: amd64
  4. pkg: extra
  5. BenchmarkSplitMessageExtra-4 2000000 968 ns/op
  6. PASS
  7. ok extra 50.184s

Some of your code is unnecessary; it uses CPU time and triggers allocations. For example, converting to utf-8 bytes to runes,for i, char := range *line {}, and converting string to []byte to string, string([]byte(testMessage)) . Some algorithms could be improved. For example, searching for a newline.

答案2

得分: 0

这里没有什么问题。StopTimerStartTimer的开销应该比splitMessageExtra大得多。它们都调用了runtime.ReadMemStats。请参考这里

英文:

There is nothing wrong. StopTimer and StartTimer should be much more expensive than splitMessageExtra. Both of them call runtime.ReadMemStats. See here.

huangapple
  • 本文由 发表于 2017年2月26日 08:06:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/42463174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定