英文:
Seemingly inconsistent results when profiling memory in Go
问题
最近我在大型数据集上运行一些用Go编写的数值代码时遇到了内存管理问题。在尝试对问题进行分析时,我用三种不同的方式测量了程序的内存使用情况:使用Go的runtime/pprof
包、使用Unix的time
实用程序,以及通过手动累加分配的数据的大小。这三种方法给出的结果不一致。
下面是我正在分析的代码的简化版本。它分配了几个切片,在每个索引位置放置值,并将它们放入父切片中:
package main
import (
"fmt"
"os"
"runtime/pprof"
"unsafe"
"flag"
)
var mprof = flag.String("mprof", "", "write memory profile to this file")
func main() {
flag.Parse()
N := 1<<15
psSlice := make([][]int64, N)
_ = psSlice
size := 0
for i := 0; i < N; i++ {
ps := make([]int64, 1<<10)
for i := range ps { ps[i] = int64(i) }
psSlice[i] = ps
size += int(unsafe.Sizeof(ps[0])) * len(ps)
}
if *mprof != "" {
f, err := os.Create(*mprof)
if err != nil { panic(err) }
pprof.WriteHeapProfile(f)
f.Close()
}
fmt.Printf("total allocated: %d MB\n", size >> 20)
}
使用命令$ time time -f "%M kB" ./mem_test -mprof=out.mprof
运行该代码会得到以下输出:
total allocated: 256 MB
1141216 kB
real 0m0.150s
user 0m0.031s
sys 0m0.113s
这里的第一个数字256 MB只是根据unsafe.Sizeof
计算的数组大小,而第二个数字1055 MB是time
报告的结果。运行pprof工具的结果是:
(pprof) top1
Total: 108.2 MB
107.8 99.5% 99.5% 107.8 99.5% main.main
这些结果在切片长度较小或较大时按预期平滑地缩放。
为什么这三个数字没有更接近?
英文:
I have recently been running some numerical codes written in Go on large datasets and have been encountering memory management issues. While attempting to profile the problem, I have measured the memory usage of my program in three different ways: with Go's runtime/pprof
package, with the unix time
utility, and by manually adding up the size of the data that I allocated. These three methods do not give me consistent results.
Below is a simplified version of the code that I am profiling. It allocates several slices, puts values at every index and places each of them inside of a parent slice:
package main
import (
"fmt"
"os"
"runtime/pprof"
"unsafe"
"flag"
)
var mprof = flag.String("mprof", "", "write memory profile to this file")
func main() {
flag.Parse()
N := 1<<15
psSlice := make([][]int64, N)
_ = psSlice
size := 0
for i := 0; i < N; i++ {
ps := make([]int64, 1<<10)
for i := range ps { ps[i] = int64(i) }
psSlice[i] = ps
size += int(unsafe.Sizeof(ps[0])) * len(ps)
}
if *mprof != "" {
f, err := os.Create(*mprof)
if err != nil { panic(err) }
pprof.WriteHeapProfile(f)
f.Close()
}
fmt.Printf("total allocated: %d MB\n", size >> 20)
}
Running this with the command $ time time -f "%M kB" ./mem_test -mprof=out.mprof
results in the output:
total allocated: 256 MB
1141216 kB
real 0m0.150s
user 0m0.031s
sys 0m0.113s
Here the first number, 256 MB, is just the size of the arrays computed from unsafe.Sizeof
and the second number, 1055 MB, is what time
reports. Running the pprof tool results in
(pprof) top1
Total: 108.2 MB
107.8 99.5% 99.5% 107.8 99.5% main.main
These results scale smoothly in the way you would expect them to for slices of smaller or larger lengths.
Why don't these three number line up more closely?
答案1
得分: 5
首先,你需要提供一个没有错误的示例。让我们从基本的数字开始。例如,
package main
import (
"fmt"
"runtime"
"unsafe"
)
func WriteMatrix(nm [][]int64) {
for n := range nm {
for m := range nm[n] {
nm[n][m]++
}
}
}
func NewMatrix(n, m int) [][]int64 {
a := make([]int64, n*m)
nm := make([][]int64, n)
lo, hi := 0, m
for i := range nm {
nm[i] = a[lo:hi:hi]
lo, hi = hi, hi+m
}
return nm
}
func MatrixSize(nm [][]int64) int64 {
size := int64(0)
for i := range nm {
size += int64(unsafe.Sizeof(nm[i]))
for j := range nm[i] {
size += int64(unsafe.Sizeof(nm[i][j]))
}
}
return size
}
var nm [][]int64
func main() {
n, m := 1<<15, 1<<10
var ms1, ms2 runtime.MemStats
runtime.ReadMemStats(&ms1)
nm = NewMatrix(n, m)
WriteMatrix(nm)
runtime.ReadMemStats(&ms2)
fmt.Println(runtime.GOARCH, runtime.GOOS)
fmt.Println("Actual: ", ms2.TotalAlloc-ms1.TotalAlloc)
fmt.Println("Estimate:", n*3*8+n*m*8)
fmt.Println("Total: ", ms2.TotalAlloc)
fmt.Println("Size: ", MatrixSize(nm))
// check top VIRT and RES for COMMAND peter
for {
WriteMatrix(nm)
}
}
输出:
$ go build peter.go && /usr/bin/time -f "%M KiB" ./peter
amd64 linux
Actual: 269221888
Estimate: 269221888
Total: 269240592
Size: 269221888
^C
Command exited with non-zero status 2
265220 KiB
$
$ top
VIRT 284268 RES 265136 COMMAND peter
这符合你的预期吗?
请参考MatrixSize
函数来正确计算内存大小。
在允许我们使用top
命令的无限循环中,通过更新矩阵将其固定为常驻内存。
当你运行这个程序时,你得到什么结果?
错误:
你从/usr/bin/time
得到的结果是1056992 KiB
,这个结果比实际值大了四倍。这是你版本的/usr/bin/time
的一个错误,ru_maxrss
报告的是以KB为单位而不是页数。我的Ubuntu版本已经修复了这个问题。
参考资料:
Re: GNU time: incorrect results
time-1.7 counts rusage wrong on Linux
“time” 1.7-24 source package in Ubuntu. ru_maxrss
报告的是以KB为单位而不是页数。 (Closes: #649402)
#649402 - [PATCH] time overestimates max RSS by a factor of 4 - Debian Bug report logs
> Subject: Fix ru_maxrss reporting Author: Richard Kettlewell
> Bug-Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649402
>
> --- time-1.7.orig/time.c
> +++ time-1.7/time.c
> @@ -392,7 +398,7 @@
> ptok ((UL) resp->ru.ru_ixrss) / MSEC_TO_TICKS (v));
> break;
> case 'M': /* Maximum resident set size. /
> - fprintf (fp, "%lu", ptok ((UL) resp->ru.ru_maxrss));
> + fprintf (fp, "%lu", (UL) resp->ru.ru_maxrss);
> break;
> case 'O': / Outputs. */
> fprintf (fp, "%ld", resp->ru.ru_oublock);
英文:
First, you need to provide an error free example. Let's start with the basic numbers. For example,
package main
import (
"fmt"
"runtime"
"unsafe"
)
func WriteMatrix(nm [][]int64) {
for n := range nm {
for m := range nm[n] {
nm[n][m]++
}
}
}
func NewMatrix(n, m int) [][]int64 {
a := make([]int64, n*m)
nm := make([][]int64, n)
lo, hi := 0, m
for i := range nm {
nm[i] = a[lo:hi:hi]
lo, hi = hi, hi+m
}
return nm
}
func MatrixSize(nm [][]int64) int64 {
size := int64(0)
for i := range nm {
size += int64(unsafe.Sizeof(nm[i]))
for j := range nm[i] {
size += int64(unsafe.Sizeof(nm[i][j]))
}
}
return size
}
var nm [][]int64
func main() {
n, m := 1<<15, 1<<10
var ms1, ms2 runtime.MemStats
runtime.ReadMemStats(&ms1)
nm = NewMatrix(n, m)
WriteMatrix(nm)
runtime.ReadMemStats(&ms2)
fmt.Println(runtime.GOARCH, runtime.GOOS)
fmt.Println("Actual: ", ms2.TotalAlloc-ms1.TotalAlloc)
fmt.Println("Estimate:", n*3*8+n*m*8)
fmt.Println("Total: ", ms2.TotalAlloc)
fmt.Println("Size: ", MatrixSize(nm))
// check top VIRT and RES for COMMAND peter
for {
WriteMatrix(nm)
}
}
Output:
<pre>
$ go build peter.go && /usr/bin/time -f "%M KiB" ./peter
amd64 linux
Actual: 269221888
Estimate: 269221888
Total: 269240592
Size: 269221888
^C
Command exited with non-zero status 2
265220 KiB
$
$ top
VIRT 284268 RES 265136 COMMAND peter
</pre>
Is this what you expected?
See MatrixSize
for the correct way to calculate the memory size.
In the infinite loop that allows us to use the top
command, pin the matrix as resident by updating it.
What results do you get when you run this program?
BUG:
Your result from /usr/bin/time
is 1056992 KiB
which too large by a factor of four. It's a bug in your version of /usr/bin/time
, ru_maxrss
is reported in KBytes not pages. My version of Ubuntu has been patched.
References:
Re: GNU time: incorrect results
time-1.7 counts rusage wrong on Linux
“time” 1.7-24 source package in Ubuntu. ru_maxrss
is reported in KBytes not pages. (Closes: #649402)
#649402 - [PATCH] time overestimates max RSS by a factor of 4 - Debian Bug report logs
> Subject: Fix ru_maxrss reporting Author: Richard Kettlewell
> Bug-Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=649402
>
> --- time-1.7.orig/time.c
> +++ time-1.7/time.c
> @@ -392,7 +398,7 @@
> ptok ((UL) resp->ru.ru_ixrss) / MSEC_TO_TICKS (v));
> break;
> case 'M': /* Maximum resident set size. /
> - fprintf (fp, "%lu", ptok ((UL) resp->ru.ru_maxrss));
> + fprintf (fp, "%lu", (UL) resp->ru.ru_maxrss);
> break;
> case 'O': / Outputs. */
> fprintf (fp, "%ld", resp->ru.ru_oublock);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论