套接字无法识别协议(套接字泄漏)

huangapple go评论90阅读模式
英文:

Socket can't identify protocol (socket leak)

问题

我有一个Go1.5.1的进程/应用程序。当我在该进程上运行/usr/sbin/lsof -p命令时,我看到很多"can't identify protocol"的输出。

monitor_ 13105 root  101u  sock      0,6      0t0 16960100 can't identify protocol
monitor_ 13105 root  102u  sock      0,6      0t0 21552427 can't identify protocol
monitor_ 13105 root  103u  sock      0,6      0t0 17565091 can't identify protocol
monitor_ 13105 root  104u  sock      0,6      0t0 18476870 can't identify protocol

proc status/limit/fd

[root@Monitor_q ~]# cat /proc/13105/status 
Name:	monitor_client
State:	S (sleeping)
Tgid:	13105
Pid:	13105
PPid:	13104
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
Utrace:	0
FDSize:	16384
Groups:	
...


[root@Monitor_q ~]# cat /proc/13105/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             3870                 3870                 processes 
Max open files            9999                 9999                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       3870                 3870                 signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

[root@Monitor_q ~]# ll /proc/13105/fd/
lrwx------ 1 root root 64 Dec  7 00:15 8382 -> socket:[52023221]
lrwx------ 1 root root 64 Dec  7 00:15 8383 -> socket:[51186627]
lrwx------ 1 root root 64 Dec  7 00:15 8384 -> socket:[51864232]
lrwx------ 1 root root 64 Dec  7 00:15 8385 -> socket:[52435453]
lrwx------ 1 root root 64 Dec  7 00:15 8386 -> socket:[51596071]
lrwx------ 1 root root 64 Dec  7 00:15 8387 -> socket:[52767667]
lrwx------ 1 root root 64 Dec  7 00:15 8388 -> socket:[52090632]
lrwx------ 1 root root 64 Dec  7 00:15 8389 -> socket:[51739068]
lrwx------ 1 root root 64 Dec  7 00:15 839 -> socket:[22963529]
lrwx------ 1 root root 64 Dec  7 00:15 8390 -> socket:[52023223]
lrwx------ 1 root root 64 Dec  7 00:15 8391 -> socket:[52560389]
lrwx------ 1 root root 64 Dec  7 00:15 8392 -> socket:[52402565]
...

但是在netstat -a命令的输出中没有类似的内容。

这些套接字是什么,我如何找出它们的作用?

monitor_client.go

package main

import (
	"crypto/tls"
	"encoding/json"
	"fmt"
	"log"
	"net"
	"net/http"
	nurl "net/url"
	"strconv"
	"strings"
	"syscall"
	"time"
)

type Result struct {
	Error      string        `json:"error"`
	HttpStatus int           `json:"http_status"`
	Stime      time.Duration `json:"http_time"`
}

//http://stackoverflow.com/questions/20990332/golang-http-timeout-and-goroutines-accumulation
//http://3.3.3.3/http?host=3.2.4.2&servername=a.test&path=/&port=33&timeout=5&scheme=http
func MonitorHttp(w http.ResponseWriter, r *http.Request) {
	var host, servername, path, port, scheme string
	var timeout int
	u, err := nurl.Parse(r.RequestURI)
	if err != nil {
		log.Fatal(err)
		return
	}
	if host = u.Query().Get("host"); host == "" {
		host = "127.0.0.0"
	}
	if servername = u.Query().Get("servername"); servername == "" {
		servername = "localhost"
	}
	if path = u.Query().Get("path"); path == "" {
		path = "/"
	}
	if port = u.Query().Get("port"); port == "" {
		port = "80"
	}
	if scheme = u.Query().Get("scheme"); scheme == "" {
		scheme = "http"
	}

	if timeout, _ = strconv.Atoi(u.Query().Get("timeout")); timeout == 0 {
		timeout = 5
	}

	//log.Printf("(host)=%s (servername)=%s (path)=%s (port)=%s (timeout)=%d", host, servername, path, port, timeout)

	w.Header().Set("Content-Type", "application/json")

	res := httptool(host, port, servername, scheme, path, timeout)
	result, _ := json.Marshal(res)
	fmt.Fprintf(w, "%s", result)
}

func httptool(ip, port, servername, scheme, path string, timeout int) Result {

	var result Result
	startTime := time.Now()
	host := ip + ":" + port

	transport := &http.Transport{
		TLSClientConfig:   &tls.Config{InsecureSkipVerify: true},
		DisableKeepAlives: true,
	}

	dialer := net.Dialer{
		Timeout:   time.Duration(timeout) * time.Second,
		KeepAlive: 0 * time.Second,
	}
	transport.Dial = func(network, address string) (net.Conn, error) {
		return dialer.Dial(network, address)
	}

	client := &http.Client{
		Transport: transport,
	}
	rawquery := ""
	url := fmt.Sprintf("%s://%s%s%s", scheme, host, path, rawquery)
	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		result.HttpStatus = -1
		errs := strings.Split(err.Error(), ": ")
		result.Error = errs[len(errs)-1]
		result.Stime = time.Now().Sub(startTime) / time.Millisecond
		return result
	}
	req.Header.Set("User-Agent", "monitor worker")
	req.Header.Set("Connection", "close")
	req.Host = servername
	resp, err := client.Do(req)
	//https://github.com/Basiclytics/neverdown/blob/master/check.go
	if err != nil {
		nerr, ok := err.(*nurl.Error)
		if ok {
			switch cerr := nerr.Err.(type) {
			case *net.OpError:
				switch cerr.Err.(type) {
				case *net.DNSError:
					errs := strings.Split(cerr.Error(), ": ")
					result.Error = "dns: " + errs[len(errs)-1]
				default:
					errs := strings.Split(cerr.Error(), ": ")
					result.Error = "server: " + errs[len(errs)-1]
				}
			default:
				switch nerr.Err.Error() {
				case "net/http: request canceled while waiting for connection":
					errs := strings.Split(cerr.Error(), ": ")
					result.Error = "timeout: " + errs[len(errs)-1]

				default:
					errs := strings.Split(cerr.Error(), ": ")
					result.Error = "unknown: " + errs[len(errs)-1]
				}
			}

		} else {
			result.Error = "unknown: " + err.Error()
		}
		result.HttpStatus = -2
		result.Stime = time.Now().Sub(startTime) / time.Millisecond
		return result
	}
	resp.Body.Close()
	result.HttpStatus = resp.StatusCode
	result.Error = "noerror";
	result.Stime = time.Now().Sub(startTime) / time.Millisecond //spend time (ms)
	return result
}

func setRlimit() {
	var rLimit syscall.Rlimit
	err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit)
	if err != nil {
		log.Printf("Unable to obtain rLimit", err)
	}
	if rLimit.Cur < rLimit.Max {
		rLimit.Max = 9999
		rLimit.Cur = 9999
		err = syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit)
		if err != nil {
			log.Printf("Unable to increase number of open files limit", err)
		}
	}
}

func main() {
	setRlimit()
	s := &http.Server{
		Addr:         ":59059",
		ReadTimeout:  7 * time.Second,
		WriteTimeout: 7 * time.Second,
	}
	http.HandleFunc("/http", MonitorHttp)

	log.Fatal(s.ListenAndServe())
}
英文:

I have a Go1.5.1 process/app. When I run /usr/sbin/lsof -p on that process, I see a lot of "can't identify protocol".

monitor_ 13105 root  101u  sock      0,6      0t0 16960100 can&#39;t identify protocol
monitor_ 13105 root  102u  sock      0,6      0t0 21552427 can&#39;t identify protocol
monitor_ 13105 root  103u  sock      0,6      0t0 17565091 can&#39;t identify protocol
monitor_ 13105 root  104u  sock      0,6      0t0 18476870 can&#39;t identify protocol

proc status/limit/fd

[root@Monitor_q ~]# cat /proc/13105/status 
Name:	monitor_client
State:	S (sleeping)
Tgid:	13105
Pid:	13105
PPid:	13104
TracerPid:	0
Uid:	0	0	0	0
Gid:	0	0	0	0
Utrace:	0
FDSize:	16384
Groups:	
...


[root@Monitor_q ~]# cat /proc/13105/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            10485760             unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             3870                 3870                 processes 
Max open files            9999                 9999                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       3870                 3870                 signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

[root@Monitor_q ~]# ll /proc/13105/fd/
lrwx------ 1 root root 64 Dec  7 00:15 8382 -&gt; socket:[52023221]
lrwx------ 1 root root 64 Dec  7 00:15 8383 -&gt; socket:[51186627]
lrwx------ 1 root root 64 Dec  7 00:15 8384 -&gt; socket:[51864232]
lrwx------ 1 root root 64 Dec  7 00:15 8385 -&gt; socket:[52435453]
lrwx------ 1 root root 64 Dec  7 00:15 8386 -&gt; socket:[51596071]
lrwx------ 1 root root 64 Dec  7 00:15 8387 -&gt; socket:[52767667]
lrwx------ 1 root root 64 Dec  7 00:15 8388 -&gt; socket:[52090632]
lrwx------ 1 root root 64 Dec  7 00:15 8389 -&gt; socket:[51739068]
lrwx------ 1 root root 64 Dec  7 00:15 839 -&gt; socket:[22963529]
lrwx------ 1 root root 64 Dec  7 00:15 8390 -&gt; socket:[52023223]
lrwx------ 1 root root 64 Dec  7 00:15 8391 -&gt; socket:[52560389]
lrwx------ 1 root root 64 Dec  7 00:15 8392 -&gt; socket:[52402565]
...

but there is no similar output in netstat -a.

What are these sockets and how can I find out what they do?

monitor_client.go

package main
import (
&quot;crypto/tls&quot;
&quot;encoding/json&quot;
&quot;fmt&quot;
&quot;log&quot;
&quot;net&quot;
&quot;net/http&quot;
nurl &quot;net/url&quot;
&quot;strconv&quot;
&quot;strings&quot;
&quot;syscall&quot;
&quot;time&quot;
)
type Result struct {
Error      string        `json:&quot;error&quot;`
HttpStatus int           `json:&quot;http_status&quot;`
Stime      time.Duration `json:&quot;http_time&quot;`
}
//http://stackoverflow.com/questions/20990332/golang-http-timeout-and-goroutines-accumulation
//http://3.3.3.3/http?host=3.2.4.2&amp;servername=a.test&amp;path=/&amp;port=33&amp;timeout=5&amp;scheme=http
func MonitorHttp(w http.ResponseWriter, r *http.Request) {
var host, servername, path, port, scheme string
var timeout int
u, err := nurl.Parse(r.RequestURI)
if err != nil {
log.Fatal(err)
return
}
if host = u.Query().Get(&quot;host&quot;); host == &quot;&quot; {
host = &quot;127.0.0.0&quot;
}
if servername = u.Query().Get(&quot;servername&quot;); servername == &quot;&quot; {
servername = &quot;localhost&quot;
}
if path = u.Query().Get(&quot;path&quot;); path == &quot;&quot; {
path = &quot;/&quot;
}
if port = u.Query().Get(&quot;port&quot;); port == &quot;&quot; {
port = &quot;80&quot;
}
if scheme = u.Query().Get(&quot;scheme&quot;); scheme == &quot;&quot; {
scheme = &quot;http&quot;
}
if timeout, _ = strconv.Atoi(u.Query().Get(&quot;timeout&quot;)); timeout == 0 {
timeout = 5
}
//log.Printf(&quot;(host)=%s (servername)=%s (path)=%s (port)=%s (timeout)=%d&quot;, host, servername, path, port, timeout)
w.Header().Set(&quot;Content-Type&quot;, &quot;application/json&quot;)
res := httptool(host, port, servername, scheme, path, timeout)
result, _ := json.Marshal(res)
fmt.Fprintf(w, &quot;%s&quot;, result)
}
func httptool(ip, port, servername, scheme, path string, timeout int) Result {
var result Result
startTime := time.Now()
host := ip + &quot;:&quot; + port
transport := &amp;http.Transport{
TLSClientConfig:   &amp;tls.Config{InsecureSkipVerify: true},
DisableKeepAlives: true,
}
dialer := net.Dialer{
Timeout:   time.Duration(timeout) * time.Second,
KeepAlive: 0 * time.Second,
}
transport.Dial = func(network, address string) (net.Conn, error) {
return dialer.Dial(network, address)
}
client := &amp;http.Client{
Transport: transport,
}
rawquery := &quot;&quot;
url := fmt.Sprintf(&quot;%s://%s%s%s&quot;, scheme, host, path, rawquery)
req, err := http.NewRequest(&quot;GET&quot;, url, nil)
if err != nil {
result.HttpStatus = -1
errs := strings.Split(err.Error(), &quot;: &quot;)
result.Error = errs[len(errs)-1]
result.Stime = time.Now().Sub(startTime) / time.Millisecond
return result
}
req.Header.Set(&quot;User-Agent&quot;, &quot;monitor worker&quot;)
req.Header.Set(&quot;Connection&quot;, &quot;close&quot;)
req.Host = servername
resp, err := client.Do(req)
//https://github.com/Basiclytics/neverdown/blob/master/check.go
if err != nil {
nerr, ok := err.(*nurl.Error)
if ok {
switch cerr := nerr.Err.(type) {
case *net.OpError:
switch cerr.Err.(type) {
case *net.DNSError:
errs := strings.Split(cerr.Error(), &quot;: &quot;)
result.Error = &quot;dns: &quot; + errs[len(errs)-1]
default:
errs := strings.Split(cerr.Error(), &quot;: &quot;)
result.Error = &quot;server: &quot; + errs[len(errs)-1]
}
default:
switch nerr.Err.Error() {
case &quot;net/http: request canceled while waiting for connection&quot;:
errs := strings.Split(cerr.Error(), &quot;: &quot;)
result.Error = &quot;timeout: &quot; + errs[len(errs)-1]
default:
errs := strings.Split(cerr.Error(), &quot;: &quot;)
result.Error = &quot;unknown: &quot; + errs[len(errs)-1]
}
}
} else {
result.Error = &quot;unknown: &quot; + err.Error()
}
result.HttpStatus = -2
result.Stime = time.Now().Sub(startTime) / time.Millisecond
return result
}
resp.Body.Close()
result.HttpStatus = resp.StatusCode
result.Error = &quot;noerror&quot;
result.Stime = time.Now().Sub(startTime) / time.Millisecond //spend time (ms)
return result
}
func setRlimit() {
var rLimit syscall.Rlimit
err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &amp;rLimit)
if err != nil {
log.Printf(&quot;Unable to obtain rLimit&quot;, err)
}
if rLimit.Cur &lt; rLimit.Max {
rLimit.Max = 9999
rLimit.Cur = 9999
err = syscall.Setrlimit(syscall.RLIMIT_NOFILE, &amp;rLimit)
if err != nil {
log.Printf(&quot;Unable to increase number of open files limit&quot;, err)
}
}
}
func main() {
setRlimit()
s := &amp;http.Server{
Addr:         &quot;:59059&quot;,
ReadTimeout:  7 * time.Second,
WriteTimeout: 7 * time.Second,
}
http.HandleFunc(&quot;/http&quot;, MonitorHttp)
log.Fatal(s.ListenAndServe())
}

答案1

得分: 2

这里有几点要注意。

首先,我无法重现你的行为,但是“无法识别协议”通常与套接字未正确关闭有关。

一些评论者建议你不必在每个处理程序中创建HTTP客户端 - 这是正确的。只需创建一次并重复使用即可。

其次,我不确定为什么要创建自己的http.Client结构体,以及为什么要禁用keepalives。你不能只使用http.Get吗?简单的代码更容易调试。

第三,我不确定为什么要覆盖transport.Dial函数。即使你必须这样做,文档(针对Go 1.9.2)中说:

%go doc http.transport.dial
type Transport struct {
// Dial specifies the dial function for creating unencrypted TCP
connections.
//
// Deprecated: Use DialContext instead, which allows the transport
// to cancel dials as soon as they are no longer needed.
// If both are set, DialContext takes priority.
Dial func(network, addr string) (net.Conn, error)

关于弃用和缺乏拨号重用的注释可能指向问题的根源。

总之,如果我是你,我会做两件事:

  • 将客户端的创建移到只执行一次的代码中,或者只使用默认客户端的http.Get
  • 如果必须覆盖默认传输字段,我会清理这个问题,并按照建议使用DialContext

祝你好运。

英文:

There are couple of points here.

I was unable to reproduce your behavior, anyway, can&#39;t identify protocol is usually tied to sockets not being properly closed.

Some commenters suggested you don't have to create http client inside each handler - that's true. Simply create it once and reuse.

Second, I'm not sure why are you creating your own http.Client struct and why you're disabling keepalives. Can't you just go with http.Get ? Simpler code is easier to debug.

Third, not sure why are you overwriting transport.Dial function. Even if you must do it, the documentation (for Go 1.9.2) says:

% go doc http.transport.dial
type Transport struct {
// Dial specifies the dial function for creating unencrypted TCP
connections.
//
// Deprecated: Use DialContext instead, which allows the transport
// to cancel dials as soon as they are no longer needed.
// If both are set, DialContext takes priority.
Dial func(network, addr string) (net.Conn, error)

That comment about deprecation and lack of dials reuse may point to the source of your problems.

To sum up, when in your shoes, I'd do two things:

  • move client creation to the code which executes once, or just use default client with http.Get
  • I'd clean up this thing with overwriting default transport fields, if you must do it then I'd use DialContext as suggested.

Good luck.

答案2

得分: 0

我无法重现这个问题。但是我有两点建议(无恶意):

  1. 在一篇文章中发现了类似的问题,该问题涉及到 SockJS-node,文章链接为 https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/,根据该问题在 FreeBSD 上观察到。但问题是“websockets 没有被正确清理”。
  2. 如果你仍然可以在相同的环境中进行测试,请尝试提供 wireshark 日志,以确认网络帧中是否存在可能导致此问题的微妙因素。

很抱歉,我无法安装 Go 1.5.1 来重现此问题。
希望这对你有帮助。

英文:

I couldn't reproduce the issue. But here are my 2 cents (no pun intended)

  1. Simmilar issue was found in SockJS-node noticed in an article https://idea.popcount.org/2012-12-09-lsof-cant-identify-protocol/ according to this issue was observed on FreeBSD. But the issue was "websockets are not bieng properly cleaned up"
  2. Another test test I would like ou to do if you still have hands on same environment. If possible post wireshark logs. just to confirm there are not subtle things in network frames which may have caused this.

I am sorry I cann't install Go 1.5.1 just to reproduce this issue.
Hope this was helpful.

huangapple
  • 本文由 发表于 2015年12月7日 00:26:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/34119757.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定