2012年10月8日 11:15:19go评论115阅读模式

英文:

Confusing (and incorrect?) output from Go profiler

问题

我有一个Go二进制文件，我正在尝试对其进行性能分析，并且我得到了令人惊讶的结果。代码在main.go中有以下（截断的）部分，其余的代码在monte包中：

package main
import (
  "monte"
  "runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
  flag.Parse()
  if *cpuprofile != "" {
    f, err := os.Create(*cpuprofile)
    if err != nil {
      log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
  }
  monte.ExpensiveOperation()
  pprof.StopCPUProfile()
}

我使用go build src/main.go构建可执行文件，然后使用./main -cpuprofile=monte.prof运行它。当我使用go tool pprof main monte.prof检查输出时，我得到以下输出：

(pprof) top10 --cum
Total: 346 samples
     280  80.9%  80.9%      343  99.1% time.Time.Format
       0   0.0%  80.9%      341  98.6% runtime.interhash
       0   0.0%  80.9%      341  98.6% runtime.main
       0   0.0%  80.9%      251  72.5% strconv.Unquote
      13   3.8%  84.7%       31   9.0% strconv.roundShortest
      11   3.2%  87.9%       18   5.2% strconv.fmtE
       9   2.6%  90.5%        9   2.6% runtime.markallocated
       1   0.3%  90.8%        8   2.3% math/rand.Float64
       2   0.6%  91.3%        8   2.3% runtime.FixAlloc_Free
       7   2.0%  93.4%        8   2.3% time.nextStdChunk

具有最大累积时间的函数是time.Time.Format，这对我来说似乎是错误的（应该是main吗？），而且根本没有提到monte，尽管“昂贵的操作”需要大约10秒才能完成，足够的时间供采样器观察到它。如果我向go tool pprof传递--focus=monte标志，根本不会显示任何样本。我想我可能在某个地方漏掉了一些标志；有人有任何想法吗？谢谢！

英文:

I have a Go binary that I'm trying to profile, and I'm getting surprising results. The code has the following (truncated) in main.go, and the rest of the code is in the package monte:

package main
import (
  &quot;monte&quot;
  &quot;runtime/pprof&quot;
)
var cpuprofile = flag.String(&quot;cpuprofile&quot;, &quot;&quot;, &quot;write cpu profile to file&quot;)
func main() {
  flag.Parse()
  if *cpuprofile != &quot;&quot; {
    f, err := os.Create(*cpuprofile)
    if err != nil {
      log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
  }
  monte.ExpensiveOperation()
  pprof.StopCPUProfile()
}

I build my executable with go build src/main.go, then run it with ./main -cpuprofile=monte.prof. When I examine the output with go tool pprof main monte.prof, I get the following output:

(pprof) top10 --cum
Total: 346 samples
     280  80.9%  80.9%      343  99.1% time.Time.Format
       0   0.0%  80.9%      341  98.6% runtime.interhash
       0   0.0%  80.9%      341  98.6% runtime.main
       0   0.0%  80.9%      251  72.5% strconv.Unquote
      13   3.8%  84.7%       31   9.0% strconv.roundShortest
      11   3.2%  87.9%       18   5.2% strconv.fmtE
       9   2.6%  90.5%        9   2.6% runtime.markallocated
       1   0.3%  90.8%        8   2.3% math/rand.Float64
       2   0.6%  91.3%        8   2.3% runtime.FixAlloc_Free
       7   2.0%  93.4%        8   2.3% time.nextStdChunk

The function that has the largest cumulative time is time.Time.Format, which seems wrong to me (shouldn't it be main?) and there's no mention of monte at all, despite the fact that the "expensive operation" takes about 10 seconds to complete, plenty of time for the sampler to see it. If I pass the --focus=monte flag to go tool pprof, no samples are shown at all. I assume I'm missing some flag somewhere; does anyone have any ideas? Thanks!

答案1

得分: 2

你的main包无法编译，并且你没有提供monte包的源代码。因此，我们无法重现你的结果，这是调试的第一步。

这里有一些可工作的源代码和结果。

package monte
func ExpensiveOperation() {
    var t float64
    for i := int64(0); i < 1e10; i++ {
        t += 1
    }
}

package main
import (
    "flag"
    "log"
    "monte"
    "os"
    "runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
    flag.Parse()
    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
    }
    monte.ExpensiveOperation()
    pprof.StopCPUProfile()
}

(pprof) top10 --cum
Total: 1166 samples
0 0.0% 0.0% 1166 100.0% main.main
1166 100.0% 100.0% 1166 100.0% monte.ExpensiveOperation
0 0.0% 100.0% 1166 100.0% runtime.main
0 0.0% 100.0% 1166 100.0% schedunlock

更新：

github.com/haldean/monte上的代码没有提供有意义的结果。它只花费了0m0.506s的时间，并且只有48个样本。

(pprof) top10 --cum
Total: 48 samples
0 0.0% 0.0% 45 93.8% main.main
0 0.0% 0.0% 45 93.8% monte.(*Scene).Evaluate
0 0.0% 0.0% 45 93.8% monte.(*Scene).Render
0 0.0% 0.0% 45 93.8% monte.(*Scene).SetColor
0 0.0% 0.0% 45 93.8% runtime.main
0 0.0% 0.0% 45 93.8% schedunlock
0 0.0% 0.0% 44 91.7% monte.(*Scene).RayCast
4 8.3% 8.3% 31 64.6% runtime.new
13 27.1% 35.4% 27 56.2% runtime.mallocgc
3 6.2% 41.7% 26 54.2% monte.(*Scene).DirectionAt

英文:

Your main package doesn't compile and you don't provide source code for your monte package. Therefore, we can't reproduce your results. which is the first step in debugging.

Here's some working source code and results.

package monte
func ExpensiveOperation() {
	var t float64
	for i := int64(0); i &lt; 1e10; i++ {
		t += 1
	}
}

package main
import (
	&quot;flag&quot;
	&quot;log&quot;
	&quot;monte&quot;
	&quot;os&quot;
	&quot;runtime/pprof&quot;
)
var cpuprofile = flag.String(&quot;cpuprofile&quot;, &quot;&quot;, &quot;write cpu profile to file&quot;)
func main() {
	flag.Parse()
	if *cpuprofile != &quot;&quot; {
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal(err)
		}
		pprof.StartCPUProfile(f)
	}
	monte.ExpensiveOperation()
	pprof.StopCPUProfile()
}

(pprof) top10 --cum
Total: 1166 samples
       0   0.0%   0.0%     1166 100.0% main.main
    1166 100.0% 100.0%     1166 100.0% monte.ExpensiveOperation
       0   0.0% 100.0%     1166 100.0% runtime.main
       0   0.0% 100.0%     1166 100.0% schedunlock

UPDATE:

The code at github.com/haldean/monte doesn't provide meaningful results. It only takes time real 0m0.506s and only takes 48 samples.

(pprof) top10 --cum
Total: 48 samples
       0   0.0%   0.0%       45  93.8% main.main
       0   0.0%   0.0%       45  93.8% monte.(*Scene).Evaluate
       0   0.0%   0.0%       45  93.8% monte.(*Scene).Render
       0   0.0%   0.0%       45  93.8% monte.(*Scene).SetColor
       0   0.0%   0.0%       45  93.8% runtime.main
       0   0.0%   0.0%       45  93.8% schedunlock
       0   0.0%   0.0%       44  91.7% monte.(*Scene).RayCast
       4   8.3%   8.3%       31  64.6% runtime.new
      13  27.1%  35.4%       27  56.2% runtime.mallocgc
       3   6.2%  41.7%       26  54.2% monte.(*Scene).DirectionAt

答案2

得分: 2

看起来像是一个仅支持CPU的分析器，所以如果你的ExpensiveOperation通过执行I/O操作、休眠或类似的操作来消耗时间，它将是不可见的。（这就是“cpu”分析器的问题。）

就数字的含义而言，看起来总共有346个样本。
这些数字有点模糊并不太令人惊讶，这取决于分析器的工作方式，但如果它是一个真正的堆栈采样器，那么这些数字的含义将是：

341/346个样本在堆栈上有main和interhash。你期望所有样本在堆栈上都有main，但这是模糊的部分。

343/346个样本在堆栈上有Format。（为什么有更多的Format而不是main，谁知道呢？）

251/346个样本在堆栈上有Unquote。所以在这251个样本中，它们可能还有main、interhash和Format在堆栈上。

通过这种侦探工作，你可以慢慢开始拼凑出样本所告诉你的信息。

当然，如果你能够实际看到堆栈样本，你在看到很少的样本之前就会知道确切发生了什么。

更多相关信息。

英文:

It looks like a CPU-only profiler, so if your ExpensiveOperation takes time by doing I/O or sleep or something like that, it will be invisible.
(That's the problem with "cpu" profilers.)

In terms of what the numbers mean, it looks like there were 346 total samples.
It's not too surprising that the numbers are a bit squishy, depending on how the profiler works, but if it were a true stack sampler the numbers would mean this:

341/346 samples had main and interhash on the stack. You would expect all samples to have main on the stack, but that's the squishy part.

343/346 samples had Format on the stack. (Why there's more of those than main, who knows?)

251/346 samples had Unquote on the stack. So of those 251 samples, they also probably had main, interhash, and Format on the stack.

With this kind of detective work, you can slowly begin to piece together what the samples are telling you.

Of course, if you could actually see the stack samples, you wouldn't have to see very many of them before you would know exactly what's going on.

More about all that.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

混乱（和错误？）的Go分析器输出

问题

答案1

答案2

将代码转译为精简的C代码？

Redigo：当Redis服务器宕机时快速失败。

正确的习惯用法字节缓冲区的读写

make a map with different data types in go for making post request with json data

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。