为什么使用Golang HTTP客户端时连接池大小会不断增加?

huangapple go评论98阅读模式
英文:

Why does connection pool size keep increasing with Golang HTTP client?

问题

我基本上正在为一个庞大的域名列表创建一个健康检查爬虫。我有一个使用Golang编写的脚本,创建了大约256个例程,用于向域名列表发送请求。我正在使用相同的客户端和以下传输配置:

// 初始化函数
this.client = &http.Client{
	Transport: &http.Transport{
		ForceAttemptHTTP2:   true,
		TLSHandshakeTimeout: TLSHandShakeTimeout,
		TLSClientConfig:     &tls.Config{InsecureSkipVerify: true},
		MaxConnsPerHost:     -1,
		DisableKeepAlives:   true,
	},
	Timeout: RequestTimeout,
}
... 
// 爬虫函数
req, err := http.NewRequestWithContext(this.ctx, "GET", opts.Url, nil)
if err != nil {
	return nil, errors.Wrap(err, "failed to create request")
}

res, err := this.client.Do(req)
if err != nil {
	return nil, err
}
defer res.Body.Close()
...

我运行了 netstat -anp | wc -l 命令,并且可以看到超过2000个处于 TIME_WAIT 状态的连接。

英文:

I am basically making a health check crawler to a huge list of domains. I have a Golang script that creates ~256 routines that make requests to the list of domains. I am using the same client with the following transport configuration:

# init func
this.client = &http.Client{
		Transport: &http.Transport{
			ForceAttemptHTTP2:   true,
			TLSHandshakeTimeout: TLSHandShakeTimeout,
			TLSClientConfig:     &tls.Config{InsecureSkipVerify: true},
			MaxConnsPerHost:     -1,
			DisableKeepAlives:   true,
		},
		Timeout: RequestTimeout,
	}
... 
# crawler func
req, err := http.NewRequestWithContext(this.ctx, "GET", opts.Url, nil)
if err != nil {
	return nil, errors.Wrap(err, "failed to create request")
}

res, err := this.client.Do(req)
if err != nil {
	return nil, err
}
defer res.Body.Close()
...

I ran netstat -anp | wc -l and can see over 2000+ connections with TIME_WAIT.

答案1

得分: 2

http.Client的默认每个主机的goroutine数量是2。一个用于接收,另一个用于发送。因此,对于成千上万个域名,这里可能会有成千上万个goroutine。

由于DisableKeepAlives设置为true,所以当HTTP响应完成时,连接将被关闭。TIME_WAIT是关闭连接后的正常TCP状态。

然而,Linux上TIME_WAIT状态的默认超时时间是60秒。大量的TIME_WAIT状态可能会导致服务器(如探测器/爬虫)连接问题。

为了解决TIME_WAIT问题,可以使用SO_LINGER选项。它禁用了默认的TCP延迟关闭行为,即在关闭连接时向对等方发送RST。这将删除TCP连接的TIME_WAIT状态。

可以在这里找到更多讨论:https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required

示例代码如下:

dialer := &net.Dialer{
	Control: func(network, address string, conn syscall.RawConn) error {
		var opterr error
		if err := conn.Control(func(fd uintptr) {
			l := &syscall.Linger{}
			opterr = syscall.SetsockoptLinger(int(fd), unix.SOL_SOCKET, unix.SO_LINGER, l)
		}); err != nil {
			return err
		}
		return opterr
	},
}
client := &http.Client{
	Transport: &http.Transport{
		DialContext: dialer.DialContext,
	},
}

此外,这里还有另一个在EaseProbe中使用SO_LINGER的用例。它是一个简单、独立且轻量级的工具,可以进行健康/状态检查。

英文:

The default number of goroutines per host for http.Client is 2. One is for the receiver and the other for the sender. So for thousands of domains, there could be thousands of goroutines here.

As the DisableKeepAlives is set to true, so the connection will be closed when the response of HTTP is done. The TIME_WAIT is the normal TCP state after closing a connection.

However, the default timeout of TIME_WAIT state on Linux is 60 seconds. The huge number of TIME_WAIT states could cause the server (such as probe/crawler) connection issue.


In order to solve the TIME_WAIT issue. The SO_LINGER option could help. It disables the default TCP delayed-close behavior, which sends the RST to the peer when the connection is closed. And it would remove the TIME_wAIT state of the TCP connection.

More discussion could be found here https://stackoverflow.com/questions/3757289/when-is-tcp-option-so-linger-0-required

Sample

	dialer := &net.Dialer{
		Control: func(network, address string, conn syscall.RawConn) error {
			var opterr error
			if err := conn.Control(func(fd uintptr) {
				l := &syscall.Linger{}
				opterr = syscall.SetsockoptLinger(int(fd), unix.SOL_SOCKET, unix.SO_LINGER, l)
			}); err != nil {
				return err
			}
			return opterr
		},
	}
	client := &http.Client{
		Transport: &http.Transport{
			DialContext: dialer.DialContext,
		},
	}

Moreover, here is another SO_LINGER use case in EaseProbe. It is a simple, standalone, and lightweight tool that can do health/status checking.

huangapple
  • 本文由 发表于 2022年10月10日 10:59:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/74009845.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定