英文:
Go 1.3 Garbage collector not releasing server memory back to system
问题
我们编写了一个最简单的TCP服务器(带有少量日志记录),以检查内存占用情况(请参见下面的tcp-server.go)。
该服务器只是接受连接并不执行任何操作。它在一个运行着Ubuntu 12.04.4 LTS服务器(内核版本为3.2.0-61-generic)上以Go版本go1.3 linux/amd64运行。
附带的基准测试程序(pulse.go)在本例中创建了10,000个连接,在30秒后断开这些连接,重复这个循环三次,然后持续重复1,000个连接/断开连接的小脉冲。测试使用的命令是./pulse -big=10000 -bs=30。
第一个附图是通过记录当客户端数量变化为500的倍数时的runtime.ReadMemStats获得的,第二个图是服务器进程在“top”中看到的RES内存大小。
服务器开始时只占用可忽略的1.6KB内存。然后,通过10,000个连接的“大”脉冲将内存设置为约60MB(如top所示),或者是runtime.ReadMemStats报告的约16MB的“SystemMemory”。如预期的那样,当10,000个脉冲结束时,正在使用的内存会下降,并且程序最终开始释放内存回到操作系统,这可以从灰色的“Released Memory”线条中看出。
问题在于系统内存(相应地,由“top”看到的RES内存)从未显著下降(尽管在第二个图中有所下降)。
我们期望在10,000个脉冲结束后,内存会继续释放,直到RES大小为处理每个1,000个脉冲所需的最小值(如“top”所示为8m RES,runtime.ReadMemStats报告为2MB)。然而,RES保持在约56MB,并且正在使用的内存从始至终都没有从其最高值60MB下降。
我们希望能够确保对于偶尔出现的突发流量具有可扩展性,并且能够在同一台服务器上运行多个具有不同时间突发的服务器。是否有一种有效的方法在合理的时间范围内释放尽可能多的内存回到系统中?
英文:
We wrote the simplest possible TCP server (with minor logging) to examine the memory footprint (see tcp-server.go below)
The server simply accepts connections and does nothing. It is being run on an Ubuntu 12.04.4 LTS server (kernel 3.2.0-61-generic) with Go version go1.3 linux/amd64.
The attached benchmarking program (pulse.go) creates, in this example, 10k connections, disconnects them after 30 seconds, repeats this cycle three times, and then continuously repeats small pulses of 1k connections/disconnections. The command used to test was ./pulse -big=10000 -bs=30.
The first attached graph is obtained by recording runtime.ReadMemStats when the number of clients has changed by a multiple of 500, and the second graph is the RES memory size seen by “top” for the server process.
The server starts with a negligible 1.6KB of memory. Then the memory is set by the “big” pulses of 10k connections at ~60MB (as seen by top), or at about 16MB “SystemMemory” as seen by ReadMemStats. As expected, when the 10K pulses end, the in-use memory drops, and eventually the program starts releasing memory back to OS as evidenced by the grey “Released Memory” line.
The problem is that the System Memory (and correspondingly, the RES memory seen by “top”) never drops significantly (although it drops a little as seen in the second graph).
We would expect that after the 10K pulses end, memory would continue to be released until the RES size is the minimum needed for handling each 1k pulse (which is 8m RES as seen by “top” and 2MB in-use reported by runtime.ReadMemStats). Instead, the RES stays at about 56MB and in-use never drops from its highest value of 60MB at all.
We want to ensure scalability for irregular traffic with occasional spikes as well as be able to run multiple servers on the same box that have spikes at different times. Is there a way to effectively ensure that as much memory is released back to the system as possible in a reasonable time frame?
Code https://gist.github.com/eugene-bulkin/e8d690b4db144f468bc5 :
server.go:
<!-- language: go -->
package main
import (
"net"
"log"
"runtime"
"sync"
)
var m sync.Mutex
var num_clients = 0
var cycle = 0
func printMem() {
var ms runtime.MemStats
runtime.ReadMemStats(&ms)
log.Printf("Cycle #%3d: %5d clients | System: %8d Inuse: %8d Released: %8d Objects: %6d\n", cycle, num_clients, ms.HeapSys, ms.HeapInuse, ms.HeapReleased, ms.HeapObjects)
}
func handleConnection(conn net.Conn) {
//log.Println("Accepted connection:", conn.RemoteAddr())
m.Lock()
num_clients++
if num_clients % 500 == 0 {
printMem()
}
m.Unlock()
buffer := make([]byte, 256)
for {
_, err := conn.Read(buffer)
if err != nil {
//log.Println("Lost connection:", conn.RemoteAddr())
err := conn.Close()
if err != nil {
log.Println("Connection close error:", err)
}
m.Lock()
num_clients--
if num_clients % 500 == 0 {
printMem()
}
if num_clients == 0 {
cycle++
}
m.Unlock()
break
}
}
}
func main() {
printMem()
cycle++
listener, err := net.Listen("tcp", ":3033")
if err != nil {
log.Fatal("Could not listen.")
}
for {
conn, err := listener.Accept()
if err != nil {
log.Println("Could not listen to client:", err)
continue
}
go handleConnection(conn)
}
}
pulse.go:
<!-- language: go -->
package main
import (
"flag"
"net"
"sync"
"log"
"time"
)
var (
numBig = flag.Int("big", 4000, "Number of connections in big pulse")
bigIters = flag.Int("i", 3, "Number of iterations of big pulse")
bigSep = flag.Int("bs", 5, "Number of seconds between big pulses")
numSmall = flag.Int("small", 1000, "Number of connections in small pulse")
smallSep = flag.Int("ss", 20, "Number of seconds between small pulses")
linger = flag.Int("l", 4, "How long connections should linger before being disconnected")
)
var m sync.Mutex
var active_conns = 0
var connections = make(map[net.Conn] bool)
func pulse(n int, linger int) {
var wg sync.WaitGroup
log.Printf("Connecting %d client(s)...\n", n)
for i := 0; i < n; i++ {
wg.Add(1)
go func() {
m.Lock()
defer m.Unlock()
defer wg.Done()
active_conns++
conn, err := net.Dial("tcp", ":3033")
if err != nil {
log.Panicln("Unable to connect: ", err)
return
}
connections[conn] = true
}()
}
wg.Wait()
if len(connections) != n {
log.Fatalf("Unable to connect all %d client(s).\n", n)
}
log.Printf("Connected %d client(s).\n", n)
time.Sleep(time.Duration(linger) * time.Second)
for conn := range connections {
active_conns--
err := conn.Close()
if err != nil {
log.Panicln("Unable to close connection:", err)
conn = nil
continue
}
delete(connections, conn)
conn = nil
}
if len(connections) > 0 {
log.Fatalf("Unable to disconnect all %d client(s) [%d remain].\n", n, len(connections))
}
log.Printf("Disconnected %d client(s).\n", n)
}
func main() {
flag.Parse()
for i := 0; i < *bigIters; i++ {
pulse(*numBig, *linger)
time.Sleep(time.Duration(*bigSep) * time.Second)
}
for {
pulse(*numSmall, *linger)
time.Sleep(time.Duration(*smallSep) * time.Second)
}
}
答案1
得分: 19
首先,请注意,Go语言本身并不总是会收缩其内存空间:
https://groups.google.com/forum/#!topic/Golang-Nuts/vfmd6zaRQVs
> 堆被释放了,你可以使用runtime.ReadMemStats()来检查,但是进程的虚拟地址空间不会收缩,也就是说,你的程序不会将内存返回给操作系统。在基于Unix的平台上,我们使用系统调用告诉操作系统可以回收未使用的堆的部分,但是这个功能在Windows平台上不可用。
但是你不是在Windows上,对吧?
好吧,这个线程没有给出明确的答案,但是它说:
https://groups.google.com/forum/#!topic/golang-nuts/MC2hWpuT7Xc
> 据我了解,内存在被GC标记为自由后大约5分钟后会返回给操作系统。GC每两分钟运行一次,除非由于内存使用增加而触发。所以最坏情况下需要7分钟才能被释放。
>
> 在这种情况下,我认为切片没有被标记为自由,而是在使用中,所以它永远不会被返回给操作系统。
有可能你没有等待足够长的时间,等待GC扫描后操作系统的回收扫描,这可能会在最后一次“大”脉冲后延迟多达7分钟。你可以使用runtime.FreeOSMemory
来显式强制执行此操作,但请记住,除非运行了GC,否则它不会起作用。
(编辑:请注意,你可以使用runtime.GC()
强制进行垃圾回收,但显然你需要小心使用频率;你可以尝试将其与连接数量的突然下降同步。)
稍微离题一点,我找不到一个明确的来源来证实这一点(除了我发布的第二个线程中有人提到同样的事情),但我记得有多次提到Go使用的并非全部是“真实”内存。如果它是由运行时分配的但实际上未被程序使用的内存,无论top
或MemStats
显示什么,操作系统实际上都可以使用该内存,因此程序“实际”使用的内存量经常被过度报告。
编辑:正如Kostix在评论中指出并支持JimB的答案,这个问题在Golang-nuts上也有发布,我们从Dmitri Vyukov那里得到了一个相当明确的答案:
https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/discussion
> 我认为目前没有解决方案。
> 大部分内存似乎被goroutine堆栈占用,我们不会将该内存释放给操作系统。
> 在下一个版本中会有所改善。
因此,我所概述的只适用于堆变量,Goroutine堆栈上的内存永远不会被释放。关于这与我上面提到的“并非所有显示的分配的系统内存都是‘真实内存’”的交互作用,还有待观察。
英文:
First, note that Go, itself, doesn't always shrink its own memory space:
https://groups.google.com/forum/#!topic/Golang-Nuts/vfmd6zaRQVs
> The heap is freed, you can check this using runtime.ReadMemStats(),
> but the processes virtual address space does not shrink -- ie, your
> program will not return memory to the operating system. On Unix based
> platforms we use a system call to tell the operating system that it
> can reclaim unused parts of the heap, this facility is not available
> on Windows platforms.
But you're not on Windows, right?
Well, this thread is less definitive, but it says:
https://groups.google.com/forum/#!topic/golang-nuts/MC2hWpuT7Xc
> As I understand, memory is returned to the OS about 5 minutes after is has been marked
> as free by the GC. And the GC runs every two minutes top, if not
> triggered by an increase in memory use. So worst-case would be 7
> minutes to be freed.
>
> In this case, I think that the slice is not marked as freed, but in
> use, so it would never be returned to the OS.
It's possible you weren't waiting long enough for the GC sweep followed by the OS return sweep, which could be up to 7 minutes after the final "big" pulse. You can explicitly force this with runtime.FreeOSMemory
, but keep in mind that it won't do anything unless the GC has been run.
(Edit: Note that you can force garbage collection with runtime.GC()
though obviously you need to be careful how often you use it; you may be able to sync it with sudden downward spikes in connections).
As a slight aside, I can't find an explicit source for this (other than the second thread I posted where someone mentions the same thing), but I recall it being mentioned several times that not all of the memory Go uses is "real" memory. If it's allocated by the runtime but not actually in use by the program, the OS actually has use of the memory regardless of what top
or MemStats
says, so the amount of memory the program is "really" using is often very overreported.
Edit: As Kostix notex in the comments and supports JimB's answer, this question was crossposted on Golang-nuts and we got a rather definitive answer from Dmitri Vyukov:
https://groups.google.com/forum/#!topic/golang-nuts/0WSOKnHGBZE/discussion
> I don't there is a solution today.
> Most of the memory seems to be occupied by goroutine stacks, and we don't release that memory to OS.
> It will be somewhat better in the next release.
So what I outlines only applies to heap variables, memory on a Goroutine stack will never be released. How exactly this interacts with my last "not all shown allocated system memory is 'real memory'" point remains to be seen.
答案2
得分: 6
正如LinearZoetrope所说,你应该等待至少7分钟来检查有多少内存被释放。有时候需要两次垃圾回收(GC)才能完成,所以可能需要9分钟。
如果这个方法不起作用,或者时间太长,你可以添加一个定期调用FreeOSMemory的函数(不需要在调用之前调用runtime.GC(),因为debug.FreeOSMemory()会自动执行)。
代码示例如下:
package main
import (
"runtime/debug"
"time"
)
func main() {
go periodicFree(1 * time.Minute)
// 在这里写入你的程序
}
func periodicFree(d time.Duration) {
tick := time.Tick(d)
for _ = range tick {
debug.FreeOSMemory()
}
}
请注意,每次调用FreeOSMemory都会花费一些时间(不多),如果GOMAXPROCS>1
,自Go1.3版本以来它可以部分并行运行。
英文:
As LinearZoetrope said, you should wait at least 7 minutes to check how much memory is freed. Sometimes it needs two GC passes, so it would be 9 minutes.
If that is not working, or it is too much time, you can add a periodic call to FreeOSMemory (no need to call runtime.GC() before, it is done by debug.FreeOSMemory() )
Something like this: http://play.golang.org/p/mP7_sMpX4F
package main
import (
"runtime/debug"
"time"
)
func main() {
go periodicFree(1 * time.Minute)
// Your program goes here
}
func periodicFree(d time.Duration) {
tick := time.Tick(d)
for _ = range tick {
debug.FreeOSMemory()
}
}
Take into account that every call to FreeOSMemory will take some time (not much) and it can be partly run in parallel if GOMAXPROCS>1
since Go1.3.
答案3
得分: 4
很不幸,答案相当简单,goroutine堆栈目前无法释放。
由于您一次连接了10,000个客户端,您需要10,000个goroutine来处理它们。每个goroutine都有8k的堆栈,即使只有第一页被错误地加载,您仍然需要至少40M的永久内存来处理最大连接数。
在go1.4中可能会有一些即将到来的更改(例如4k的堆栈),但现在我们必须接受这个事实。
英文:
The answer is unfortunately pretty simple, goroutine stacks can't currently be released.
Since you're connecting 10k clients at once, you need 10k goroutines to handle them. Each goroutine has an 8k stack, and even if only the first page is faulted in, you still need at least 40M of permanent memory to handle your max connections.
There are some pending changes that may help in go1.4 (like 4k stacks), but it's a fact we have to live with for now.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论