声明非常大的数组并在stdin上进行迭代时,内存会持续增加。

huangapple go评论84阅读模式
英文:

Continuous memory increase when declaring very large array and iterating over stdin

问题

以下是翻译好的内容:

以下代码声明了两个数组,然后迭代 stdin(只是盲目地迭代文件 - 不与数组交互)。

这导致内存持续增加。

然而,如果我只声明两个数组并休眠 - 内存不会增加。

同样,如果我只迭代 stdin - 内存不会增加。

但是在一起(除了为数组分配的内存),内存会持续增加。

我通过使用 top 工具查看 RES 内存来衡量这一点。

我已经注释掉了 func doSomething() 中的前几行,以显示当它被注释时没有内存增加。取消注释这些行并运行将导致增加。

注意:这是在 go 1.4.2、1.5.3 和 1.6 上运行的。

注意:您需要在至少有 16GB RAM 的机器上重新创建此问题,因为我只观察到在 10 亿大小的数组上出现这个问题。

package main

import (
    "bufio"
    "fmt"
    "io"
    "os"
)

type MyStruct struct {
    arr1 []int
    arr2 []int
}

func (ms *MyStruct) Init(size int, arr1 []int, arr2 []int) error {
    fmt.Printf("初始化 mystruct arr1...\n")
    ms.arr1 = arr1
    if ms.arr1 == nil {
        ms.arr1 = make([]int, size, size)
    }
    fmt.Printf("初始化 mystruct arr2...\n")
    ms.arr2 = arr2
    if ms.arr2 == nil {
        ms.arr2 = make([]int, size, size)
    }
    fmt.Printf("初始化完成...\n")
    for i := 0; i < size; i++ {
        ms.arr1[i] = 0
        ms.arr2[i] = 0
    }
    return nil
}

func doSomething() error {
    fmt.Printf("开始...\n")
    fmt.Printf("分配内存\n")
    /* 注意:当取消注释时会导致内存增加
    ms := &MyStruct{}
    size := 1000000000
    ms.Init(size, nil, nil)
    */

    fmt.Printf("分配完成..%d %d\n", len(ms.arr1), len(ms.arr2))

    fmt.Printf("从 stdin 读取...\n")
    reader := bufio.NewReader(os.Stdin)

    var line string
    var readErr error
    var lineNo int = 0
    for {
        if lineNo%1000000 == 0 {
            fmt.Printf("读取 %d 行...\n", lineNo)
        }
        lineNo++

        line, readErr = reader.ReadString('\n')
        if readErr != nil {
            fmt.Printf("在 %s 处中断\n", line)
            break
        }
    }

    if readErr == io.EOF {
        readErr = nil
    }

    if readErr != nil {
        return readErr
    }

    return nil
}

func main() {
    if err := doSomething(); err != nil {
        panic(err)
    }
    fmt.Printf("完成...\n")
}
  1. 这是我的代码的问题吗?还是 go 系统做了一些意外的事情?
  2. 如果是后者,我该如何调试这个问题?

为了更容易复制,这里有好的情况(上述代码的注释部分)和坏的情况(取消注释部分)的 pastebin 文件。

wget http://pastebin.com/raw/QfG22xXk -O badcase.go
yes "1234567890" | go run badcase.go

wget http://pastebin.com/raw/G9xS2fKy -O goodcase.go
yes "1234567890" | go run goodcase.go
英文:

The following code declares two arrays, and then iterates over stdin ( just blindly iterates over the file - no interaction with the arrays ).

This is causing continuous increase in memory.

However, if I just declare two arrays and sleep - there is no increase in memory.

Similarly, if I just iterate over stdin - there is no increase in memory.

But together ( apart from the memory allocated for the arrays) there is a continuous increase.

I measure this by looking at the RES memory using top tool.

I have commented out the first few lines in func doSomething() to show that there is no memory increase when it is commented. Uncommenting the lines and running will cause an increase.

> NOTE: This was run on go 1.4.2, 1.5.3 and 1.6
>
> NOTE: You will need to recreate this on a machine with at least 16GB RAM as I have observed it only on the array size of 1 billion.

<!-- language: lang-go -->

package main
import (
&quot;bufio&quot;
&quot;fmt&quot;
&quot;io&quot;
&quot;os&quot;
)
type MyStruct struct {
arr1 []int
arr2 []int
}
func (ms *MyStruct) Init(size int, arr1 []int, arr2 []int) error {
fmt.Printf(&quot;initializing mystruct arr1...\n&quot;)
ms.arr1 = arr1
if ms.arr1 == nil {
ms.arr1 = make([]int, size, size)
}
fmt.Printf(&quot;initializing mystruct arr2...\n&quot;)
ms.arr2 = arr2
if ms.arr2 == nil {
ms.arr2 = make([]int, size, size)
}
fmt.Printf(&quot;done initializing ...\n&quot;)
for i := 0; i &lt; size; i++ {
ms.arr1[i] = 0
ms.arr2[i] = 0
}
return nil
}
func doSomething() error {
fmt.Printf(&quot;starting...\n&quot;)
fmt.Printf(&quot;allocating\n&quot;)
/* NOTE WHEN UNCOMMENTED CAUSES MEMORY INCREASE 
ms := &amp;MyStruct{}
size := 1000000000
ms.Init(size, nil, nil)
*/
fmt.Printf(&quot;finished allocating..%d %d\n&quot;, len(ms.arr1), len(ms.arr2))
fmt.Printf(&quot;reading from stdin...\n&quot;)
reader := bufio.NewReader(os.Stdin)
var line string
var readErr error
var lineNo int = 0
for {
if lineNo%1000000 == 0 {
fmt.Printf(&quot;read %d lines...\n&quot;, lineNo)
}
lineNo++
line, readErr = reader.ReadString(&#39;\n&#39;)
if readErr != nil {
fmt.Printf(&quot;break at %s\n&quot;, line)
break
}
}
if readErr == io.EOF {
readErr = nil
}
if readErr != nil {
return readErr
}
return nil
}
func main() {
if err := doSomething(); err != nil {
panic(err)
}
fmt.Printf(&quot;done...\n&quot;)
}
  1. Is this an issue with my code ? Or is the go system doing something unintended ?
  2. If its the latter, how can I go about debugging this ?

To make it easier to replicate here are pastebin files for good case ( commented portion of the above code) and bad case ( with uncommented portion )

<!-- language: lang-bash -->

wget http://pastebin.com/raw/QfG22xXk -O badcase.go
yes &quot;1234567890&quot; | go run badcase.go
wget http://pastebin.com/raw/G9xS2fKy -O goodcase.go
yes &quot;1234567890&quot; | go run goodcase.go

答案1

得分: 2

谢谢[Volker][1]对你上面评论的回复。我想把调试过程作为答案记录下来。

RES top / htop 只告诉你进程级别的内存使用情况。GODEBUG="gctrace=1" 可以提供更多关于内存处理的信息。

设置 gctrace 后运行程序,得到以下结果:

root@localhost ~ # yes &quot;12345678901234567890123456789012&quot; | GODEBUG=&quot;gctrace=1&quot; go run badcase.go
starting...
allocating
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 @0.050s 0%: 0.19+0.23+0.068 ms clock, 0.58+0.016/0.16/0.25+0.20 ms cpu, 7629-&gt;7629-&gt;7629 MB, 7630 MB goal, 8 P
done initializing ...
gc 2 @0.100s 0%: 0.070+2515+0.23 ms clock, 0.49+0.025/0.096/0.24+1.6finished allocating..1000000000 1000000000
ms cpu, 15258-&gt;15258reading from stdin...
-&gt;15258 MB, 15259read 0 lines...
MB goal, 8 P
gc 3 @2.620s 0%: 0.009+0.32+0.23 ms clock, 0.072+0/0.20/0.11+1.8 ms cpu, 15259-&gt;15259-&gt;15258 MB, 30517 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
....
read 51000000 lines...
read 52000000 lines...
read 53000000 lines...
read 54000000 lines...

> 这是什么意思?

如你所见,已经有一段时间没有触发垃圾回收(gc)了。这意味着从 reader.ReadString 生成的垃圾没有被收集和释放。

> 为什么垃圾回收器没有回收这些垃圾?

根据[Go的垃圾回收机制][2]:

> 相反,我们提供了一个名为 GOGC 的单一参数。该值控制堆的总大小与可达对象大小的比例。默认值为 100,意味着在上一次回收后,堆的总大小是可达对象大小的两倍。

由于没有设置 GOGC,使用的是默认值 100。因此,只有当堆大小达到约 32GB 时,垃圾回收器才会触发(因为最初的两个数组给出了 16GB 的堆空间,只有堆大小翻倍时,垃圾回收器才会触发)。

> 如何更改这个值?

尝试将 GOGC 设置为 25。

设置 GOGC 为 25 后:

root@localhost ~ # yes &quot;12345678901234567890123456789012&quot; | GODEBUG=&quot;gctrace=1&quot; GOGC=25 go run badcase.go
starting...
allocating
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 @0.051s 0%: 0.14+0.30+0.11 ms clock, 0.42+0.016/0.31/0.094+0.35 ms cpu, 7629-&gt;7629-&gt;7629 MB, 7630 MB goal, 8 P
done initializing ...
finished allocating..1000000000 1000000000
gc 2 @0.102s reading from stdin...
12%: 0.058+2480+0.26 ms clock, 0.40+0.022/2480/0.10+1.8 ms cpu, 15258-&gt;15258-&gt;15258 MB, 15259 MB goal, 8 P
read 0 lines...
gc 3 @2.584s 12%: 0.009+0.20+0.22 ms clock, 0.075+0/0.24/0.046+1.8 ms cpu, 15259-&gt;15259-&gt;15258 MB, 19073 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
....
read 19000000 lines...
read 20000000 lines...
gc 4 @6.539s 4%: 0.019+2.3+0.23 ms clock, 0.15+0/2.1/12+1.8 ms cpu, 17166-&gt;17166-&gt;15258 MB, 19073 MB goal, 8 P

如你所见,另一个垃圾回收被触发了。

> 但是 top/htop 显示的是稳定在约 20GB,而不是计算出的 16GB。

垃圾回收器不一定要将内存归还给操作系统。它有时会保留一部分内存以便将来高效使用。它不必一直向操作系统请求内存,然后再归还给操作系统 - 多出的 4GB 是它的自由空间,在再次向操作系统请求之前可以使用。

[1]: https://stackoverflow.com/users/1266756/volker "Volker"
[2]: https://blog.golang.org/go15gc

英文:

Thank you [Volker][1] for your above comments. I wanted to capture the process of debugging this as an answer.

The RES top / htop just tells you at a process level what is going on with memory. GODEBUG="gctrace=1" gives you more insight into how the memory is being handled.

A simple run with gctrace set gives the following

root@localhost ~ # yes &quot;12345678901234567890123456789012&quot; | GODEBUG=&quot;gctrace=1&quot; go run badcase.go
starting...
allocating
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 @0.050s 0%: 0.19+0.23+0.068 ms clock, 0.58+0.016/0.16/0.25+0.20 ms cpu, 7629-&gt;7629-&gt;7629 MB, 7630 MB goal, 8 P
done initializing ...
gc 2 @0.100s 0%: 0.070+2515+0.23 ms clock, 0.49+0.025/0.096/0.24+1.6finished allocating..1000000000 1000000000
ms cpu, 15258-&gt;15258reading from stdin...
-&gt;15258 MB, 15259read 0 lines...
MB goal, 8 P
gc 3 @2.620s 0%: 0.009+0.32+0.23 ms clock, 0.072+0/0.20/0.11+1.8 ms cpu, 15259-&gt;15259-&gt;15258 MB, 30517 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
....
read 51000000 lines...
read 52000000 lines...
read 53000000 lines...
read 54000000 lines...

> What does this mean ?

As you can see, the gc hasn't been called for a while now. This means that all the garbage generated from reader.ReadString hasn't been collected and free'd.

> Why isn't the garbage collector collecting this garbage ?

From [The go gc][2]

> Instead we provide a single knob, called GOGC. This value controls
> the total size of the heap relative to the size of reachable objects.
> The default value of 100 means that total heap size is now 100% bigger
> than (i.e., twice) the size of the reachable objects after the last
> collection.

Since GOGC wasn't set - the default was 100%. So, it would have collected the garbage only when it reached ~32GB. ( Since initially the two arrays give you 16GB of heap space - only when heap doubles will the gc trigger ).

> How can I change this ?
Try setting the GOGC=25.

With the GOGC as 25

root@localhost ~ # yes &quot;12345678901234567890123456789012&quot; | GODEBUG=&quot;gctrace=1&quot; GOGC=25 go run badcase.go
starting...
allocating
initializing mystruct arr1...
initializing mystruct arr2...
gc 1 @0.051s 0%: 0.14+0.30+0.11 ms clock, 0.42+0.016/0.31/0.094+0.35 ms cpu, 7629-&gt;7629-&gt;7629 MB, 7630 MB goal, 8 P
done initializing ...
finished allocating..1000000000 1000000000
gc 2 @0.102s reading from stdin...
12%: 0.058+2480+0.26 ms clock, 0.40+0.022/2480/0.10+1.8 ms cpu, 15258-&gt;15258-&gt;15258 MB, 15259 MB goal, 8 P
read 0 lines...
gc 3 @2.584s 12%: 0.009+0.20+0.22 ms clock, 0.075+0/0.24/0.046+1.8 ms cpu, 15259-&gt;15259-&gt;15258 MB, 19073 MB goal, 8 P
read 1000000 lines...
read 2000000 lines...
read 3000000 lines...
read 4000000 lines...
....
read 19000000 lines...
read 20000000 lines...
gc 4 @6.539s 4%: 0.019+2.3+0.23 ms clock, 0.15+0/2.1/12+1.8 ms cpu, 17166-&gt;17166-&gt;15258 MB, 19073 MB goal, 8 P

As you can see, another gc was triggered.

> But top/htop shows it stable at ~20 GB instead of the calculated 16GB.

The garbage collector doesn't "have" to give it back to the OS. It will sometimes keep it to use efficiently for the future. It doesn't have to keep taking from the OS and giving back - The extra 4 gb is in its pool of free space to use before asking the OS again.

[1]: https://stackoverflow.com/users/1266756/volker "Volker"
[2]: https://blog.golang.org/go15gc

huangapple
  • 本文由 发表于 2016年2月29日 16:16:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/35694703.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定