2014年12月9日 01:19:14go评论89阅读模式

英文:

How to get a function-duration breakdown in go (profiling)

问题

更新（2019年1月24日）：

这个问题是4年前关于Go 1.4的（现在仍然有人查看）。自那时以来，使用pprof进行性能分析发生了巨大变化。

原始问题：

我正在尝试对我编写的基于Go Martini的服务器进行性能分析，我想对单个请求进行性能分析，并获取函数的完整分解以及它们的运行时间。我尝试使用runtime/pprof和net/http/pprof进行尝试，但输出结果看起来像这样：

总共：3个样本
       1个  33.3%  33.3%        1个  33.3% 外部代码
       1个  33.3%  66.7%        1个  33.3% runtime.futex
       1个  33.3% 100.0%        2个  66.7% syscall.Syscall

Web视图也不是很有帮助。

我们经常对另一个程序进行性能分析，输出结果似乎是我需要的：

20ms的20ms总时间（100%）
      flat  flat%  sum%        cum  cum%
      10ms 50.00% 50.00%     10ms 50.00%  runtime.duffcopy
      10ms 50.00%   100%     10ms 50.00%  runtime.fastrand1
         0     0%   100%     20ms   100%  main.func·004
         0     0%   100%     20ms   100%  main.pruneAlerts
         0     0%   100%     20ms   100%  runtime.memclr

我无法确定差异来自何处。

英文:

Update (Jan 24, 2019):

This question was asked 4 years ago about Go 1.4 (and is still getting views). Profiling with pprof has changed dramatically since then.

Original Question:

I'm trying to profile a go martini based server I wrote, I want to profile a single request, and get the complete breakdown of the function with their runtime duration.
I tried playing around with both runtime/pprof and net/http/pprof but the output looks like this:

Total: 3 samples
       1  33.3%  33.3%        1  33.3% ExternalCode
       1  33.3%  66.7%        1  33.3% runtime.futex
       1  33.3% 100.0%        2  66.7% syscall.Syscall

The web view is not very helpful either.

We regularly profile another program, and the output seems to be what I need:

20ms of 20ms total (  100%)
      flat  flat%  sum%        cum  cum%
      10ms 50.00% 50.00%     10ms 50.00%  runtime.duffcopy
      10ms 50.00%   100%     10ms 50.00%  runtime.fastrand1
         0     0%   100%     20ms   100%  main.func&#183;004
         0     0%   100%     20ms   100%  main.pruneAlerts
         0     0%   100%     20ms   100%  runtime.memclr

I can't tell where the difference is coming from.

答案1

得分: 7

pprof是一个基于定时器的采样分析器，最初来自gperftools套件。Rus Cox后来将pprof工具移植到了Go语言中：http://research.swtch.com/pprof。

这个基于定时器的分析器通过使用系统分析定时器，在接收到SIGPROF信号时记录统计信息。在Go语言中，这个定时器的频率目前设置为固定的100Hz。在pprof.go文件中可以看到以下代码：

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds.  Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100

你可以通过调用runtime.SetCPUProfileRate来设置这个频率，并自己编写分析输出。Gperftools也允许你使用CPUPROFILE_FREQUENCY来设置这个频率，但实际上并不太有用。

为了对程序进行采样，它需要一直在执行你想要测量的操作。对空闲运行时进行采样并不会显示任何有用的信息。通常你需要在基准测试中运行你想要测量的代码，或者在一个热循环中运行，尽可能多地使用CPU时间。在累积足够多的样本之后，应该会有足够数量的样本覆盖所有函数，以便按比例显示每个函数所花费的时间。

另请参阅：

英文:

pprof is a timer based sampling profiler, originally from the gperftools suite. Rus Cox later ported the pprof tools to Go: http://research.swtch.com/pprof.

This timer based profiler works by using the system profiling timer, and recording statistics whenever it receives SIGPROF. In go, this is currently set to a constant 100Hz. From pprof.go:

// The runtime routines allow a variable profiling rate,
// but in practice operating systems cannot trigger signals
// at more than about 500 Hz, and our processing of the
// signal is not cheap (mostly getting the stack trace).
// 100 Hz is a reasonable choice: it is frequent enough to
// produce useful data, rare enough not to bog down the
// system, and a nice round number to make it easy to
// convert sample counts to seconds.  Instead of requiring
// each client to specify the frequency, we hard code it.
const hz = 100

You can set this frequency by calling runtime.SetCPUProfileRate and writing the profile output yourself, and Gperftools allows you to set this frequency with CPUPROFILE_FREQUENCY, but in practice it's not that useful.

In order to sample a program, it needs to be doing what you're trying to measure at all times. Sampling the idle runtime isn't showing anything useful. What you usually do is run the code you want in a benchmark, or in a hot loop, using as much CPU time as possible. After accumulating enough samples, there should be a sufficient number across all functions to show you proportionally how much time is spent in each function.

如何在Go语言中获取函数执行时间的详细分解（性能分析）

问题

答案1

如何在go-restful中绑定一个处理程序？

我们在Go语言中是否过度使用了指针传递？

How to create JSON for Go struct

Go语言的修订历史背后有什么故事？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论