2017年2月26日 08:06:23go评论116阅读模式

英文:

How do I benchmark a function that modifies its input?

问题

当我对修改输入的函数进行基准测试时，我必须为每个基准测试循环复制测试数据，并在复制数据时暂停计时器。这意味着如果我运行go test -bench MyTest -benchtime 1s，测试可能需要2分钟而不是1秒。

我是不是做错了什么，还是我只能接受这个情况？

更多背景信息：

我正在编写一个用于读取syslog日志的程序。我的日志记录范例的一部分是，日志消息的第一行包含可读的文本，后续行包含“额外信息”，例如堆栈跟踪。因此，我的日志读取器（除其他功能外）会在第一个换行符处拆分消息，该换行符由rsyslog转义为#012。

以下是该功能的代码：

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    var prev rune
    for i, char := range *line {
        if prev == 0 && char == '#' {
            prev = char
            continue
        }
        if prev == '#' && char == '0' {
            prev = char
            continue
        }
        if prev == '0' && char == '1' {
            prev = char
            continue
        }
        if prev == '1' && char == '2' {
            extra := (*line)[i+1:]
            *line = (*line)[0 : i-3]
            return extra
        }
        prev = 0
    }
    return ""
}

它最初使用了strings.Split并返回新的字符串，但是CPU分析显示它太慢了。

以下是基准测试函数：

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        msg := string([]byte(testMessage))
        b.StartTimer()
        splitMessageExtra(&msg)
    }
}

以下是不暂停计时器运行的结果：

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8    3000000    434 ns/op
PASS
ok    github.com/Hubro/logreader    1.730s

以下是使用上述完整基准测试函数运行的结果：

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8    5000000    385 ns/op
PASS
ok    github.com/Hubro/logreader    100.563s

注意，运行时间非常长。

英文:

When I'm benchmarking a function that modifies its input, I have to copy the test data for each loop of the benchmark, and pause the timer while I'm doing so. This can mean that if I run go test -bench MyTest -benchtime 1s the test can take 2 full minutes rather than 1 second.

Am I doing something wrong or will I just have to live with this?

More context:

I'm writing a program for reading syslog logs. Part of my logging paradigm is that the first line of a logged message contains readable text, and following lines contain "extra information", like a stack trace. My log reader therefore (among other things) splits the message on the first line break, which is escaped to #012 by rsyslog.

Here is the code for that:

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	var prev rune
	for i, char := range *line {
		if prev == 0 &amp;&amp; char == &#39;#&#39; {
			prev = char
			continue
		}
		if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
			prev = char
			continue
		}
		if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
			prev = char
			continue
		}
		if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]
			return extra
		}
		prev = 0
	}
	return &quot;&quot;
}

It originally used strings.Split and returned new strings, but cpu profiling showed that it was way too slow.

Here is the benchmark function:

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()
		splitMessageExtra(&amp;msg)
	}
}

Here's a run without pausing the timer:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 3000000 434 ns/op
PASS
ok github.com/Hubro/logreader 1.730s
</pre>

And here's a run with the exact benchmark function above:

<pre>
$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8 5000000 385 ns/op
PASS
ok github.com/Hubro/logreader 100.563s
</pre>

Notice it takes AGES to run.

答案1

得分: 1

你的代码和基准测试确实很慢。这里有一个更快的版本。

package main
import (
	"strings"
	"testing"
)
// 将主行与额外信息分开
func splitMessageExtra(line *string) string {
	const newline = "#012"
	i := strings.Index(*line, newline)
	if i < 0 {
		return ""
	}
	extra := (*line)[i+len(newline):]
	*line = (*line)[0:i]
	return extra
}
var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i < b.N; i++ {
		msg := testMessage
		splitMessageExtra(&msg)
	}
}

输出结果：

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	50000000	        32.2 ns/op
PASS
ok  	extra	1.647s

为了比较，这里是你的代码和基准测试的结果。你的代码和基准测试比我的代码和基准测试慢：968 ns/op 和 50.184s 对比 32.2 ns/op 和 1.647s。

package main
import (
	"testing"
)
// 将主行与额外信息分开
func splitMessageExtra(line *string) string {
	var prev rune
	for i, char := range *line {
		if prev == 0 && char == '#' {
			prev = char
			continue
		}
		if prev == '#' && char == '0' {
			prev = char
			continue
		}
		if prev == '0' && char == '1' {
			prev = char
			continue
		}
		if prev == '1' && char == '2' {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]
			return extra
		}
		prev = 0
	}
	return ""
}
var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i < b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()
		splitMessageExtra(&msg)
	}
}

输出结果：

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4    2000000	       968 ns/op	
PASS
ok  	extra	50.184s

你的一些代码是不必要的，它们使用了CPU时间并触发了内存分配。例如，将UTF-8字节转换为符文，for i, char := range *line {}，以及将string转换为[]byte再转换回string，string([]byte(testMessage))。一些算法可以改进，例如搜索换行符。

英文:

Your code and benchmark do seem slow. Here's a faster version.

package main
import (
	&quot;strings&quot;
	&quot;testing&quot;
)
// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	const newline = &quot;#012&quot;
	i := strings.Index(*line, newline)
	if i &lt; 0 {
		return &quot;&quot;
	}
	extra := (*line)[i+len(newline):]
	*line = (*line)[0:i]
	return extra
}
var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		msg := testMessage
		splitMessageExtra(&amp;msg)
	}
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	50000000	        32.2 ns/op
PASS
ok  	extra	1.647s

For comparison, here are the results from your code and benchmark. Your code and benchmark are slower than mine: 968 ns/op and 50.184s versus 32.2 ns/op and 1.647s respectively.

package main
import (
	&quot;testing&quot;
)
// Splits the main line from extra information
func splitMessageExtra(line *string) string {
	var prev rune
	for i, char := range *line {
		if prev == 0 &amp;&amp; char == &#39;#&#39; {
			prev = char
			continue
		}
		if prev == &#39;#&#39; &amp;&amp; char == &#39;0&#39; {
			prev = char
			continue
		}
		if prev == &#39;0&#39; &amp;&amp; char == &#39;1&#39; {
			prev = char
			continue
		}
		if prev == &#39;1&#39; &amp;&amp; char == &#39;2&#39; {
			extra := (*line)[i+1:]
			*line = (*line)[0 : i-3]
			return extra
		}
		prev = 0
	}
	return &quot;&quot;
}
var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread &quot;processor_mta03&quot;]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=&lt;&gt;, size=24431, nrcpt=1 (queue active)`
func BenchmarkSplitMessageExtra(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		b.StopTimer()
		msg := string([]byte(testMessage))
		b.StartTimer()
		splitMessageExtra(&amp;msg)
	}
}

Output:

$ go test -bench=.
goos: linux
goarch: amd64
pkg: extra
BenchmarkSplitMessageExtra-4   	 2000000	       968 ns/op	
PASS
ok  	extra	50.184s

Some of your code is unnecessary; it uses CPU time and triggers allocations. For example, converting to utf-8 bytes to runes,for i, char := range *line {}, and converting string to []byte to string, string([]byte(testMessage)) . Some algorithms could be improved. For example, searching for a newline.

答案2

得分: 0

这里没有什么问题。StopTimer和StartTimer的开销应该比splitMessageExtra大得多。它们都调用了runtime.ReadMemStats。请参考这里。

英文:

There is nothing wrong. StopTimer and StartTimer should be much more expensive than splitMessageExtra. Both of them call runtime.ReadMemStats. See here.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何对修改输入的函数进行基准测试？

问题

答案1

答案2

检查来自Firestore Get golang sdk的错误返回代码。

在Golang中，针对int64键的更好的分片函数是什么？

使用”go get”命令安装Go包是否算作在GitHub上进行克隆操作？

从MongoDB中的两个集合中采样具有多个数组匹配的数据。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。