英文:
"Memory used" metric: Go tool pprof vs docker stats
问题
我写了一个运行在每个 Docker 容器中的 Golang 应用程序。它们通过 TCP 和 UDP 使用 protobuf 进行通信,我使用 Hashicorp 的 memberlist 库来发现网络中的每个容器。
在 Docker 的统计信息中,我发现内存使用量是线性增加的,所以我正在尝试找出应用程序中的任何泄漏问题。
由于这是一个持续运行的应用程序,我使用 http pprof 来检查任意一个容器中的实时应用程序。
我发现 runtime.MemStats.sys 是恒定的,即使 Docker 的统计信息是线性增加的。
我的 --inuse_space 大约是 1MB,而 --alloc_space 当然会随着时间的推移而增加。以下是 alloc_space 的示例:
root@n3:/app# go tool pprof --alloc_space main http://localhost:8080/debug/pprof/heap
Fetching profile from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.main.localhost:8080.alloc_objects.alloc_space.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top --cum
1024.11kB of 10298.19kB total ( 9.94%)
Dropped 8 nodes (cum <= 51.49kB)
Showing top 10 nodes out of 34 (cum >= 1536.07kB)
flat flat% sum% cum cum%
0 0% 0% 10298.19kB 100% runtime.goexit
0 0% 0% 6144.48kB 59.67% main.Listener
0 0% 0% 3072.20kB 29.83% github.com/golang/protobuf/proto.Unmarshal
512.10kB 4.97% 4.97% 3072.20kB 29.83% github.com/golang/protobuf/proto.UnmarshalMerge
0 0% 4.97% 2560.17kB 24.86% github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).Unmarshal
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).unmarshalType
512.01kB 4.97% 9.94% 2048.23kB 19.89% main.SaveAsFile
0 0% 9.94% 1536.07kB 14.92% reflect.New
(pprof) list main.Listener
Total: 10.06MB
ROUTINE ======================== main.Listener in /app/listener.go
0 6MB (flat, cum) 59.67% of Total
. . 24: l.SetReadBuffer(MaxDatagramSize)
. . 25: defer l.Close()
. . 26: m := new(NewMsg)
. . 27: b := make([]byte, MaxDatagramSize)
. . 28: for {
. 512.02kB 29: n, src, err := l.ReadFromUDP(b)
. . 30: if err != nil {
. . 31: log.Fatal("ReadFromUDP failed:", err)
. . 32: }
. 512.02kB 33: log.Println(n, "bytes read from", src)
. . 34: //TODO remove later. For testing Fetcher only
. . 35: if rand.Intn(100) < MCastDropPercent {
. . 36: continue
. . 37: }
. 3MB 38: err = proto.Unmarshal(b[:n], m)
. . 39: if err != nil {
. . 40: log.Fatal("protobuf Unmarshal failed", err)
. . 41: }
. . 42: id := m.GetHead().GetMsgId()
. . 43: log.Println("CONFIG-UPDATE-RECEIVED { \"update_id\" =", id, "}")
. . 44: //TODO check whether value already exists in store?
. . 45: store.Add(id)
. 2MB 46: SaveAsFile(id, b[:n], StoreDir)
. . 47: m.Reset()
. . 48: }
. . 49:}
(pprof)
我已经验证了没有 goroutine 泄漏,使用 http://<n3-ipaddress>:8080/debug/pprof/goroutine?debug=1
请解释为什么 Docker 的统计信息显示了不同的情况(内存线性增加)。
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
n3 0.13% 19.73 MiB / 31.36 GiB 0.06% 595 kB / 806 B 0 B / 73.73 kB 14
如果我让它运行一整夜,内存会增加到大约 250MB。我还没有运行得比这更长时间,但我觉得内存应该已经达到了一个平台,而不是线性增加。
英文:
I wrote a golang application running in each of my docker containers. It communicates with each other using protobufs via tcp and udp and I use Hashicorp's memberlist library to discover each of the containers in my network.
On docker stats I see that the memory usage is linearly increasing so I am trying to find any leaks in my application.
Since it is an application which keeps running, am using http pprof to check the live application in any one of the containers.
I see that runtime.MemStats.sys is constant even though docker stats is linearly increasing.
My --inuse_space is around 1MB and --alloc_space ofcourse keeps increasing over time. Here is a sample of alloc_space:
root@n3:/app# go tool pprof --alloc_space main http://localhost:8080/debug/pprof/heap
Fetching profile from http://localhost:8080/debug/pprof/heap
Saved profile in /root/pprof/pprof.main.localhost:8080.alloc_objects.alloc_space.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top --cum
1024.11kB of 10298.19kB total ( 9.94%)
Dropped 8 nodes (cum <= 51.49kB)
Showing top 10 nodes out of 34 (cum >= 1536.07kB)
flat flat% sum% cum cum%
0 0% 0% 10298.19kB 100% runtime.goexit
0 0% 0% 6144.48kB 59.67% main.Listener
0 0% 0% 3072.20kB 29.83% github.com/golang/protobuf/proto.Unmarshal
512.10kB 4.97% 4.97% 3072.20kB 29.83% github.com/golang/protobuf/proto.UnmarshalMerge
0 0% 4.97% 2560.17kB 24.86% github.com/hashicorp/memberlist.(*Memberlist).triggerFunc
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).Unmarshal
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).dec_struct_message
0 0% 4.97% 2560.10kB 24.86% github.com/golang/protobuf/proto.(*Buffer).unmarshalType
512.01kB 4.97% 9.94% 2048.23kB 19.89% main.SaveAsFile
0 0% 9.94% 1536.07kB 14.92% reflect.New
(pprof) list main.Listener
Total: 10.06MB
ROUTINE ======================== main.Listener in /app/listener.go
0 6MB (flat, cum) 59.67% of Total
. . 24: l.SetReadBuffer(MaxDatagramSize)
. . 25: defer l.Close()
. . 26: m := new(NewMsg)
. . 27: b := make([]byte, MaxDatagramSize)
. . 28: for {
. 512.02kB 29: n, src, err := l.ReadFromUDP(b)
. . 30: if err != nil {
. . 31: log.Fatal("ReadFromUDP failed:", err)
. . 32: }
. 512.02kB 33: log.Println(n, "bytes read from", src)
. . 34: //TODO remove later. For testing Fetcher only
. . 35: if rand.Intn(100) < MCastDropPercent {
. . 36: continue
. . 37: }
. 3MB 38: err = proto.Unmarshal(b[:n], m)
. . 39: if err != nil {
. . 40: log.Fatal("protobuf Unmarshal failed", err)
. . 41: }
. . 42: id := m.GetHead().GetMsgId()
. . 43: log.Println("CONFIG-UPDATE-RECEIVED { \"update_id\" =", id, "}")
. . 44: //TODO check whether value already exists in store?
. . 45: store.Add(id)
. 2MB 46: SaveAsFile(id, b[:n], StoreDir)
. . 47: m.Reset()
. . 48: }
. . 49:}
(pprof)
I have been able to verify that no goroutine leak is happening using http://<n3-ipaddress>:8080/debug/pprof/goroutine?debug=1
Please comment on why docker stats shows a different picture (linearly increasing memory)
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
n3 0.13% 19.73 MiB / 31.36 GiB 0.06% 595 kB / 806 B 0 B / 73.73 kB 14
If I run it over night, this memory bloats to around 250MB. I have not run it longer than that, but I feel this should have reached a plateau instead of increasing linearly
答案1
得分: 8
docker stats显示来自cgroups的内存使用统计信息。如果你阅读了“过时但有用”的文档(https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt),它说:
5.5 usage_in_bytes
为了提高效率,与其他内核组件一样,内存cgroup使用一些优化来避免不必要的缓存行虚假共享。
usage_in_bytes受到该方法的影响,不显示内存(和交换)使用的“精确”值,它是一种用于高效访问的模糊值。(当然,必要时会进行同步。)如果你想了解更精确的内存使用情况,应该使用memory.stat中的RSS+CACHE(+SWAP)值(参见5.2)。
页面缓存和RES包含在内存usage_in_bytes数字中。因此,如果容器进行文件I/O,内存使用统计将增加。然而,对于一个容器来说,如果使用量达到了最大限制,它会回收一些未使用的内存。因此,当我为容器添加了内存限制时,我可以观察到当达到限制时内存被回收和使用。除非没有内存可回收并发生OOM错误,否则容器进程不会被杀死。对于那些关心docker stats中显示的数字的人来说,简单的方法是检查cgroups中可用的详细统计信息的路径:/sys/fs/cgroup/memory/docker/
这将显示memory.stats或其他memory.*文件中的所有内存指标的详细信息。
如果你想在“docker run”命令中限制docker容器使用的资源,可以按照此参考链接进行操作:https://docs.docker.com/engine/admin/resource_constraints/
由于我正在使用docker-compose,我通过在docker-compose.yml文件中的所需服务下添加一行来实现限制:
mem_limit: 32m
其中m代表兆字节。
英文:
docker stats shows the memory usage stats from cgroups. (Refer: https://docs.docker.com/engine/admin/runmetrics/)
If you read the "outdated but useful" documentation (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) it says
> 5.5 usage_in_bytes
>
> For efficiency, as other kernel components, memory cgroup uses some
> optimization to avoid unnecessary cacheline false sharing.
> usage_in_bytes is affected by the method and doesn't show 'exact'
> value of memory (and swap) usage, it's a fuzz value for efficient
> access. (Of course, when necessary, it's synchronized.) If you want to
> know more exact memory usage, you should use RSS+CACHE(+SWAP) value in
> memory.stat(see 5.2).
Page Cache and RES are included in the memory usage_in_bytes number. So if the container has File I/O, the memory usage stat will increase. However, for a container, if the usage hits that maximum limit, it reclaims some of the memory which is unused. Hence, when I added a memory limit to my container, I could observe that the memory is reclaimed and used when the limit is hit. The container processes are not killed unless there is no memory to reclaim and a OOM error happens. For anyone concerned with the numbers shown in docker stats, the easy way is to check the detailed stats available in cgroups at the path: /sys/fs/cgroup/memory/docker/<longid>/
This shows all the memory metrics in detail in memory.stats or other memory.* files.
If you want to limit the resources used by the docker container in the "docker run" command you can do so by following this reference: https://docs.docker.com/engine/admin/resource_constraints/
Since I am using docker-compose, I did it by adding a line in my docker-compose.yml file under the service I wanted to limit:
> mem_limit: 32m
where m stands for megabytes.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论