英文:
How to get a function-duration breakdown in go (profiling)
问题
更新(2019年1月24日):
这个问题是4年前关于Go 1.4的(现在仍然有人查看)。自那时以来,使用pprof进行性能分析发生了巨大变化。
原始问题:
我正在尝试对我编写的基于Go Martini的服务器进行性能分析,我想对单个请求进行性能分析,并获取函数的完整分解以及它们的运行时间。我尝试使用runtime/pprof
和net/http/pprof
进行尝试,但输出结果看起来像这样:
总共:3个样本
1个 33.3% 33.3% 1个 33.3% 外部代码
1个 33.3% 66.7% 1个 33.3% runtime.futex
1个 33.3% 100.0% 2个 66.7% syscall.Syscall
Web视图也不是很有帮助。
我们经常对另一个程序进行性能分析,输出结果似乎是我需要的:
20ms的20ms总时间(100%)
flat flat% sum% cum cum%
10ms 50.00% 50.00% 10ms 50.00% runtime.duffcopy
10ms 50.00% 100% 10ms 50.00% runtime.fastrand1
0 0% 100% 20ms 100% main.func·004
0 0% 100% 20ms 100% main.pruneAlerts
0 0% 100% 20ms 100% runtime.memclr
我无法确定差异来自何处。
英文:
Update (Jan 24, 2019):
This question was asked 4 years ago about Go 1.4 (and is still getting views). Profiling with pprof has changed dramatically since then.
Original Question:
I'm trying to profile a go martini based server I wrote, I want to profile a single request, and get the complete breakdown of the function with their runtime duration.
I tried playing around with both runtime/pprof
and net/http/pprof
but the output looks like this:
Total: 3 samples
1 33.3% 33.3% 1 33.3% ExternalCode
1 33.3% 66.7% 1 33.3% runtime.futex
1 33.3% 100.0% 2 66.7% syscall.Syscall
The web view is not very helpful either.
We regularly profile another program, and the output seems to be what I need:
20ms of 20ms total ( 100%)
flat flat% sum% cum cum%
10ms 50.00% 50.00% 10ms 50.00% runtime.duffcopy
10ms 50.00% 100% 10ms 50.00% runtime.fastrand1
0 0% 100% 20ms 100% main.func·004
0 0% 100% 20ms 100% main.pruneAlerts
0 0% 100% 20ms 100% runtime.memclr
I can't tell where the difference is coming from.
答案1
得分: 7
pprof
是一个基于定时器的采样分析器,最初来自gperftools套件。Rus Cox后来将pprof工具移植到了Go语言中:http://research.swtch.com/pprof。
这个基于定时器的分析器通过使用系统分析定时器,在接收到SIGPROF
信号时记录统计信息。在Go语言中,这个定时器的频率目前设置为固定的100Hz。在pprof.go文件中可以看到以下代码:
// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds. Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100
你可以通过调用runtime.SetCPUProfileRate
来设置这个频率,并自己编写分析输出。Gperftools也允许你使用CPUPROFILE_FREQUENCY
来设置这个频率,但实际上并不太有用。
为了对程序进行采样,它需要一直在执行你想要测量的操作。对空闲运行时进行采样并不会显示任何有用的信息。通常你需要在基准测试中运行你想要测量的代码,或者在一个热循环中运行,尽可能多地使用CPU时间。在累积足够多的样本之后,应该会有足够数量的样本覆盖所有函数,以便按比例显示每个函数所花费的时间。
另请参阅:
- http://golang.org/pkg/runtime/pprof/
- http://golang.org/pkg/net/http/pprof/
- http://blog.golang.org/profiling-go-programs
- https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs
英文:
pprof
is a timer based sampling profiler, originally from the gperftools suite. Rus Cox later ported the pprof tools to Go: http://research.swtch.com/pprof.
This timer based profiler works by using the system profiling timer, and recording statistics whenever it receives SIGPROF
. In go, this is currently set to a constant 100Hz. From pprof.go:
// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds. Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100
You can set this frequency by calling runtime.SetCPUProfileRate
and writing the profile output yourself, and Gperftools allows you to set this frequency with CPUPROFILE_FREQUENCY
, but in practice it's not that useful.
In order to sample a program, it needs to be doing what you're trying to measure at all times. Sampling the idle runtime isn't showing anything useful. What you usually do is run the code you want in a benchmark, or in a hot loop, using as much CPU time as possible. After accumulating enough samples, there should be a sufficient number across all functions to show you proportionally how much time is spent in each function.
See also:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论