英文:
Confusing (and incorrect?) output from Go profiler
问题
我有一个Go二进制文件,我正在尝试对其进行性能分析,并且我得到了令人惊讶的结果。代码在main.go
中有以下(截断的)部分,其余的代码在monte
包中:
package main
import (
"monte"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
我使用go build src/main.go
构建可执行文件,然后使用./main -cpuprofile=monte.prof
运行它。当我使用go tool pprof main monte.prof
检查输出时,我得到以下输出:
(pprof) top10 --cum
Total: 346 samples
280 80.9% 80.9% 343 99.1% time.Time.Format
0 0.0% 80.9% 341 98.6% runtime.interhash
0 0.0% 80.9% 341 98.6% runtime.main
0 0.0% 80.9% 251 72.5% strconv.Unquote
13 3.8% 84.7% 31 9.0% strconv.roundShortest
11 3.2% 87.9% 18 5.2% strconv.fmtE
9 2.6% 90.5% 9 2.6% runtime.markallocated
1 0.3% 90.8% 8 2.3% math/rand.Float64
2 0.6% 91.3% 8 2.3% runtime.FixAlloc_Free
7 2.0% 93.4% 8 2.3% time.nextStdChunk
具有最大累积时间的函数是time.Time.Format
,这对我来说似乎是错误的(应该是main
吗?),而且根本没有提到monte
,尽管“昂贵的操作”需要大约10秒才能完成,足够的时间供采样器观察到它。如果我向go tool pprof
传递--focus=monte
标志,根本不会显示任何样本。我想我可能在某个地方漏掉了一些标志;有人有任何想法吗?谢谢!
英文:
I have a Go binary that I'm trying to profile, and I'm getting surprising results. The code has the following (truncated) in main.go
, and the rest of the code is in the package monte
:
package main
import (
"monte"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
I build my executable with go build src/main.go
, then run it with ./main -cpuprofile=monte.prof
. When I examine the output with go tool pprof main monte.prof
, I get the following output:
(pprof) top10 --cum
Total: 346 samples
280 80.9% 80.9% 343 99.1% time.Time.Format
0 0.0% 80.9% 341 98.6% runtime.interhash
0 0.0% 80.9% 341 98.6% runtime.main
0 0.0% 80.9% 251 72.5% strconv.Unquote
13 3.8% 84.7% 31 9.0% strconv.roundShortest
11 3.2% 87.9% 18 5.2% strconv.fmtE
9 2.6% 90.5% 9 2.6% runtime.markallocated
1 0.3% 90.8% 8 2.3% math/rand.Float64
2 0.6% 91.3% 8 2.3% runtime.FixAlloc_Free
7 2.0% 93.4% 8 2.3% time.nextStdChunk
The function that has the largest cumulative time is time.Time.Format
, which seems wrong to me (shouldn't it be main
?) and there's no mention of monte
at all, despite the fact that the "expensive operation" takes about 10 seconds to complete, plenty of time for the sampler to see it. If I pass the --focus=monte
flag to go tool pprof
, no samples are shown at all. I assume I'm missing some flag somewhere; does anyone have any ideas? Thanks!
答案1
得分: 2
你的main
包无法编译,并且你没有提供monte
包的源代码。因此,我们无法重现你的结果,这是调试的第一步。
这里有一些可工作的源代码和结果。
package monte
func ExpensiveOperation() {
var t float64
for i := int64(0); i < 1e10; i++ {
t += 1
}
}
package main
import (
"flag"
"log"
"monte"
"os"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
(pprof) top10 --cum
Total: 1166 samples
0 0.0% 0.0% 1166 100.0% main.main
1166 100.0% 100.0% 1166 100.0% monte.ExpensiveOperation
0 0.0% 100.0% 1166 100.0% runtime.main
0 0.0% 100.0% 1166 100.0% schedunlock
更新:
github.com/haldean/monte
上的代码没有提供有意义的结果。它只花费了0m0.506s的时间,并且只有48个样本。
(pprof) top10 --cum
Total: 48 samples
0 0.0% 0.0% 45 93.8% main.main
0 0.0% 0.0% 45 93.8% monte.(*Scene).Evaluate
0 0.0% 0.0% 45 93.8% monte.(*Scene).Render
0 0.0% 0.0% 45 93.8% monte.(*Scene).SetColor
0 0.0% 0.0% 45 93.8% runtime.main
0 0.0% 0.0% 45 93.8% schedunlock
0 0.0% 0.0% 44 91.7% monte.(*Scene).RayCast
4 8.3% 8.3% 31 64.6% runtime.new
13 27.1% 35.4% 27 56.2% runtime.mallocgc
3 6.2% 41.7% 26 54.2% monte.(*Scene).DirectionAt
英文:
Your main
package doesn't compile and you don't provide source code for your monte
package. Therefore, we can't reproduce your results. which is the first step in debugging.
Here's some working source code and results.
package monte
func ExpensiveOperation() {
var t float64
for i := int64(0); i < 1e10; i++ {
t += 1
}
}
.
package main
import (
"flag"
"log"
"monte"
"os"
"runtime/pprof"
)
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
flag.Parse()
if *cpuprofile != "" {
f, err := os.Create(*cpuprofile)
if err != nil {
log.Fatal(err)
}
pprof.StartCPUProfile(f)
}
monte.ExpensiveOperation()
pprof.StopCPUProfile()
}
.
(pprof) top10 --cum
Total: 1166 samples
0 0.0% 0.0% 1166 100.0% main.main
1166 100.0% 100.0% 1166 100.0% monte.ExpensiveOperation
0 0.0% 100.0% 1166 100.0% runtime.main
0 0.0% 100.0% 1166 100.0% schedunlock
UPDATE:
The code at github.com/haldean/monte
doesn't provide meaningful results. It only takes time real 0m0.506s and only takes 48 samples.
(pprof) top10 --cum
Total: 48 samples
0 0.0% 0.0% 45 93.8% main.main
0 0.0% 0.0% 45 93.8% monte.(*Scene).Evaluate
0 0.0% 0.0% 45 93.8% monte.(*Scene).Render
0 0.0% 0.0% 45 93.8% monte.(*Scene).SetColor
0 0.0% 0.0% 45 93.8% runtime.main
0 0.0% 0.0% 45 93.8% schedunlock
0 0.0% 0.0% 44 91.7% monte.(*Scene).RayCast
4 8.3% 8.3% 31 64.6% runtime.new
13 27.1% 35.4% 27 56.2% runtime.mallocgc
3 6.2% 41.7% 26 54.2% monte.(*Scene).DirectionAt
答案2
得分: 2
看起来像是一个仅支持CPU的分析器,所以如果你的ExpensiveOperation
通过执行I/O操作、休眠或类似的操作来消耗时间,它将是不可见的。(这就是“cpu”分析器的问题。)
就数字的含义而言,看起来总共有346个样本。
这些数字有点模糊并不太令人惊讶,这取决于分析器的工作方式,但如果它是一个真正的堆栈采样器,那么这些数字的含义将是:
341/346个样本在堆栈上有main
和interhash
。你期望所有样本在堆栈上都有main,但这是模糊的部分。
343/346个样本在堆栈上有Format
。(为什么有更多的Format
而不是main
,谁知道呢?)
251/346个样本在堆栈上有Unquote
。所以在这251个样本中,它们可能还有main
、interhash
和Format
在堆栈上。
通过这种侦探工作,你可以慢慢开始拼凑出样本所告诉你的信息。
当然,如果你能够实际看到堆栈样本,你在看到很少的样本之前就会知道确切发生了什么。
英文:
It looks like a CPU-only profiler, so if your ExpensiveOperation
takes time by doing I/O or sleep or something like that, it will be invisible.
(That's the problem with "cpu" profilers.)
In terms of what the numbers mean, it looks like there were 346 total samples.
It's not too surprising that the numbers are a bit squishy, depending on how the profiler works, but if it were a true stack sampler the numbers would mean this:
341/346 samples had main
and interhash
on the stack. You would expect all samples to have main on the stack, but that's the squishy part.
343/346 samples had Format
on the stack. (Why there's more of those than main
, who knows?)
251/346 samples had Unquote
on the stack. So of those 251 samples, they also probably had main
, interhash
, and Format
on the stack.
With this kind of detective work, you can slowly begin to piece together what the samples are telling you.
Of course, if you could actually see the stack samples, you wouldn't have to see very many of them before you would know exactly what's going on.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论