如何在Go语言中获取函数执行时间的详细分解(性能分析)

huangapple go评论84阅读模式
英文:

How to get a function-duration breakdown in go (profiling)

问题

更新(2019年1月24日):

这个问题是4年前关于Go 1.4的(现在仍然有人查看)。自那时以来,使用pprof进行性能分析发生了巨大变化。

原始问题:

我正在尝试对我编写的基于Go Martini的服务器进行性能分析,我想对单个请求进行性能分析,并获取函数的完整分解以及它们的运行时间。我尝试使用runtime/pprofnet/http/pprof进行尝试,但输出结果看起来像这样:

总共:3个样本
       1个  33.3%  33.3%        1个  33.3% 外部代码
       1个  33.3%  66.7%        1个  33.3% runtime.futex
       1个  33.3% 100.0%        2个  66.7% syscall.Syscall

Web视图也不是很有帮助。

我们经常对另一个程序进行性能分析,输出结果似乎是我需要的:

20ms的20ms总时间(100%)
      flat  flat%  sum%        cum  cum%
      10ms 50.00% 50.00%     10ms 50.00%  runtime.duffcopy
      10ms 50.00%   100%     10ms 50.00%  runtime.fastrand1
         0     0%   100%     20ms   100%  main.func·004
         0     0%   100%     20ms   100%  main.pruneAlerts
         0     0%   100%     20ms   100%  runtime.memclr

我无法确定差异来自何处。

英文:

Update (Jan 24, 2019):

This question was asked 4 years ago about Go 1.4 (and is still getting views). Profiling with pprof has changed dramatically since then.

Original Question:

I'm trying to profile a go martini based server I wrote, I want to profile a single request, and get the complete breakdown of the function with their runtime duration.
I tried playing around with both runtime/pprof and net/http/pprof but the output looks like this:

Total: 3 samples
       1  33.3%  33.3%        1  33.3% ExternalCode
       1  33.3%  66.7%        1  33.3% runtime.futex
       1  33.3% 100.0%        2  66.7% syscall.Syscall

The web view is not very helpful either.

We regularly profile another program, and the output seems to be what I need:

20ms of 20ms total (  100%)
      flat  flat%  sum%        cum  cum%
      10ms 50.00% 50.00%     10ms 50.00%  runtime.duffcopy
      10ms 50.00%   100%     10ms 50.00%  runtime.fastrand1
         0     0%   100%     20ms   100%  main.func·004
         0     0%   100%     20ms   100%  main.pruneAlerts
         0     0%   100%     20ms   100%  runtime.memclr

I can't tell where the difference is coming from.

答案1

得分: 7

pprof是一个基于定时器的采样分析器,最初来自gperftools套件。Rus Cox后来将pprof工具移植到了Go语言中:http://research.swtch.com/pprof。

这个基于定时器的分析器通过使用系统分析定时器,在接收到SIGPROF信号时记录统计信息。在Go语言中,这个定时器的频率目前设置为固定的100Hz。在pprof.go文件中可以看到以下代码:

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds.  Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100

你可以通过调用runtime.SetCPUProfileRate来设置这个频率,并自己编写分析输出。Gperftools也允许你使用CPUPROFILE_FREQUENCY来设置这个频率,但实际上并不太有用。

为了对程序进行采样,它需要一直在执行你想要测量的操作。对空闲运行时进行采样并不会显示任何有用的信息。通常你需要在基准测试中运行你想要测量的代码,或者在一个热循环中运行,尽可能多地使用CPU时间。在累积足够多的样本之后,应该会有足够数量的样本覆盖所有函数,以便按比例显示每个函数所花费的时间。

另请参阅:

英文:

pprof is a timer based sampling profiler, originally from the gperftools suite. Rus Cox later ported the pprof tools to Go: http://research.swtch.com/pprof.

This timer based profiler works by using the system profiling timer, and recording statistics whenever it receives SIGPROF. In go, this is currently set to a constant 100Hz. From pprof.go:

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds.  Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100

You can set this frequency by calling runtime.SetCPUProfileRate and writing the profile output yourself, and Gperftools allows you to set this frequency with CPUPROFILE_FREQUENCY, but in practice it's not that useful.

In order to sample a program, it needs to be doing what you're trying to measure at all times. Sampling the idle runtime isn't showing anything useful. What you usually do is run the code you want in a benchmark, or in a hot loop, using as much CPU time as possible. After accumulating enough samples, there should be a sufficient number across all functions to show you proportionally how much time is spent in each function.

See also:

huangapple
  • 本文由 发表于 2014年12月9日 01:19:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/27363325.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定