Go在使用`http.Get`方法后不会释放内存。

huangapple go评论112阅读模式
英文:

Go doesn't release memory after http.Get

问题

我正在使用简单的线程池加载网页,同时从文件中动态加载URL。但是这个小程序会慢慢分配与我的服务器一样多的内存,直到内存杀手停止它。看起来resp.Body.Close()不会释放正文文本的内存(内存大小约为下载页面数*平均页面大小)。我该如何强制golang释放为正文HTML文本分配的内存?

package main

import (
	"bufio"
	"fmt"
	"io/ioutil"
	"net/http"
	"os"
	"strings"
	"sync"
)

func worker(linkChan chan string, wg *sync.WaitGroup) {
	defer wg.Done()

	for url := range linkChan {
		// 获取正文文本
		resp, err := http.Get(url)
		if err != nil {
			fmt.Printf("失败的URL:%s\n", url)
			continue
		}
		body, err := ioutil.ReadAll(resp.Body)
		resp.Body.Close()
		if err != nil {
			fmt.Printf("失败的URL:%s\n", url)
			continue
		}
		// 测试页面正文
		has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
		fmt.Printf("完成的URL:%s\t%t\n", url, has_rem_code)
	}
}

func main() {
	// 创建工作池
	lCh := make(chan string, 30)
	wg := new(sync.WaitGroup)

	for i := 0; i < 30; i++ {
		wg.Add(1)
		go worker(lCh, wg)
	}

	// 打开包含URL的文件
	file, err := os.Open("./tmp/new.csv")
	defer file.Close()
	if err != nil {
		panic(err)
	}
	reader := bufio.NewReader(file)

	// 处理URL
	for href, _, err := reader.ReadLine(); err == nil; href, _, err = reader.ReadLine() {
		lCh <- string(href)
	}

	close(lCh)
	wg.Wait()
}

这是pprof工具的一些输出:

      flat  flat%   sum%        cum   cum%
34.63MB 29.39% 29.39%    34.63MB 29.39%  bufio.NewReaderSize
30MB 25.46% 54.84%       30MB 25.46%  net/http.(*Transport).getIdleConnCh
23.09MB 19.59% 74.44%    23.09MB 19.59%  bufio.NewWriter
11.63MB  9.87% 84.30%    11.63MB  9.87%  net/http.(*Transport).putIdleConn
6.50MB  5.52% 89.82%     6.50MB  5.52%  main.main

看起来像是这个问题(https://github.com/golang/go/issues/5794),但是已经在两年前修复了。

英文:

I am loading web pages using simple thread pool, while dynamically loading urls from file. But this small program slowly allocate as much memory as my server has, until omm killer stops it. It looks like resp.Body.Close() doesn't free memory for body text (memory size ~ downloaded pages * avg page size). How can I force golang to free memory allocated for body html text?

package main
import (
&quot;bufio&quot;
&quot;fmt&quot;
&quot;io/ioutil&quot;
&quot;net/http&quot;
&quot;os&quot;
&quot;strings&quot;
&quot;sync&quot;
)
func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
for url := range linkChan {
// Getting body text
resp, err := http.Get(url)
if err != nil {
fmt.Printf(&quot;Fail url: %s\n&quot;, url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf(&quot;Fail url: %s\n&quot;, url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), &quot;googleadservices.com/pagead/conversion.js&quot;)
fmt.Printf(&quot;Done url: %s\t%t\n&quot;, url, has_rem_code)
}
}
func main() {
// Creating worker pool
lCh := make(chan string, 30)
wg := new(sync.WaitGroup)
for i := 0; i &lt; 30; i++ {
wg.Add(1)
go worker(lCh, wg)
}
// Opening file with urls
file, err := os.Open(&quot;./tmp/new.csv&quot;)
defer file.Close()
if err != nil {
panic(err)
}
reader := bufio.NewReader(file)
// Processing urls
for href, _, err := reader.ReadLine(); err == nil; href, _, err = reader.ReadLine() {
lCh &lt;- string(href)
}
close(lCh)
wg.Wait()
}

Here is some output from pprof tool:

      flat  flat%   sum%        cum   cum%
34.63MB 29.39% 29.39%    34.63MB 29.39%  bufio.NewReaderSize
30MB 25.46% 54.84%       30MB 25.46%  net/http.(*Transport).getIdleConnCh
23.09MB 19.59% 74.44%    23.09MB 19.59%  bufio.NewWriter
11.63MB  9.87% 84.30%    11.63MB  9.87%  net/http.(*Transport).putIdleConn
6.50MB  5.52% 89.82%     6.50MB  5.52%  main.main

Looks like this issue, but it's fixed 2 years ago.

答案1

得分: 5

golang-nuts论坛的这个帖子中找到了答案。在我的情况下(数十万个不同的主机),http.Transport会保存连接以便将来重用,导致内存膨胀。但是完全禁用KeepAlives可以解决这个问题。

以下是工作的代码:

func worker(linkChan chan string, wg *sync.WaitGroup) {
	defer wg.Done()

	var transport http.RoundTripper = &http.Transport{
		DisableKeepAlives: true,
	}

	c := &http.Client{Transport: transport}

	for url := range linkChan {
		// 获取页面内容
		resp, err := c.Get(url)
		if err != nil {
			fmt.Printf("失败的URL:%s\n", url)
			continue
		}
		body, err := ioutil.ReadAll(resp.Body)
		resp.Body.Close()
		if err != nil {
			fmt.Printf("失败的URL:%s\n", url)
			continue
		}
		// 检查页面内容
		has_rem_code := strings.Contains(string(body), "googleadservices.com/pagead/conversion.js")
		fmt.Printf("完成的URL:%s\t%t\n", url, has_rem_code)
	}
}

希望对你有帮助!

英文:

Found the answer in this thread on golang-nuts. http.Transport saves connections for future reusing in case of request to same host, causing memory bloating in my case (hundreds thousands of different hosts). But disabling KeepAlives totally solves that problem.

Working code:

func worker(linkChan chan string, wg *sync.WaitGroup) {
defer wg.Done()
var transport http.RoundTripper = &amp;http.Transport{
DisableKeepAlives: true,
}
c := &amp;http.Client{Transport: transport}
for url := range linkChan {
// Getting body text
resp, err := c.Get(url)
if err != nil {
fmt.Printf(&quot;Fail url: %s\n&quot;, url)
continue
}
body, err := ioutil.ReadAll(resp.Body)
resp.Body.Close()
if err != nil {
fmt.Printf(&quot;Fail url: %s\n&quot;, url)
continue
}
// Test page body
has_rem_code := strings.Contains(string(body), &quot;googleadservices.com/pagead/conversion.js&quot;)
fmt.Printf(&quot;Done url: %s\t%t\n&quot;, url, has_rem_code)
}
}

huangapple
  • 本文由 发表于 2015年7月31日 22:20:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/31748433.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定