golang get massive read tcp ip:port i/o timeout in ubuntu 14.04 LTS

huangapple go评论89阅读模式
英文:

golang get massive read tcp ip:port i/o timeout in ubuntu 14.04 LTS

问题

我写了一个使用golang编写的程序,在过去的几个月中在Ubuntu 12.04 LTS上运行良好,直到我将其升级到14.04 LTS。

我的程序主要用于发送HTTP请求,每秒发送大约2-10个HTTP请求。HTTP请求的地址是不同的。

当问题发生时,首先,一些请求显示为read tcp [ip]:[port]: i/o timeout,然后几分钟后,所有请求都显示为read tcp [ip]:[port]: i/o timeout,无法发送任何请求。

我重新启动程序,一切又恢复正常。

我们所有的服务器(2台服务器)在从12.04升级到14.04后都出现了这个问题。

我为每个请求创建了一个新的goroutine。

问题不会在相同的时间间隔内发生,有时一两天不会发生,有时一小时内会发生两次。

以下是我请求HTTP地址的代码:

t := &http.Transport{
    Dial:            timeoutDial(data.Timeout),
    TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
//req := s.ParseReq(data)
req := data.convert2Request()
if req == nil {
    return
}

var resp *http.Response
if data.Redirect {
    c := &http.Client{
        Transport: t,
    }
    resp, err = c.Do(req)
} else {
    resp, err = t.RoundTrip(req)
}

data.updateTry()

r := s.ParseResp(data, resp, err)

更新尝试次数的代码:

func (d *SendData) updateTry() {
    d.Try++
    d.LastSend = time.Now()
}

timeoutDial函数的代码:

func timeoutDial(timeout int) func(netw, addr string) (net.Conn, error) {
    if timeout <= 0 {
        timeout = 10
    }
    return func(netw, addr string) (net.Conn, error) {
        deadline := time.Now().Add(time.Duration(timeout) * time.Second)
        c, err := net.DialTimeout(netw, addr, time.Second*time.Duration(timeout+5))
        if err != nil {
            return nil, err
        }
        c.SetDeadline(deadline)
        return c, nil
    }
}

我对响应的处理代码如下:

func (s *Sender) ParseResp(data SendData, resp *http.Response, err error) (r Resp) {
    r = Resp{URL: data.URL}
    if err != nil {
        r.Err = err.Error()
    } else {
        r.HttpCode = resp.StatusCode
        r.Header = resp.Header
        r.URL = resp.Request.URL.String()
        defer resp.Body.Close()
        // we just read part of response and log it.
        reader := bufio.NewReader(resp.Body)
        buf := make([]byte, bytes.MinRead) // 512 byte
        for len(r.Body) < 1024 {           // max 1k
            var n int
            if n, _ = reader.Read(buf); n == 0 {
                break
            }
            r.Body += string(buf[:n])
        }
    }
    return
}

我还发现在/etc/sysctl.conf中的设置可以减少问题的发生频率:

net.core.somaxconn = 65535
net.netfilter.nf_conntrack_max = 655350
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

我需要帮助解决这个问题。

看起来像是这个问题,但我没有看到任何解决方案:https://bugs.launchpad.net/juju-core/+bug/1307434

英文:

I wrote a golang program which run well in the past several months in ubuntu 12.04 LTS until I upgraded it to 14.04 LTS

My program is focused on sending HTTP requests which send about 2-10 HTTP requests per second. The HTTP request address vary.

When the problem occurs, first, some of the requests shows read tcp [ip]:[port]: i/o timeout, then after several minutes all requests show read tcp [ip]:[port]: i/o timeout, not any request can be sent.

I restart the program, everything become right again.

All of our servers(2 server) have such problem after upgraded from 12.04 to 14.04

I create new goroutine for every request

the problem does not occur in the same interval, sometimes it won't occur one or two day, sometimes It occur twice in an hour

Bellow is my code requesting HTTP Address:

t := &amp;http.Transport{
	Dial:            timeoutDial(data.Timeout),
	TLSClientConfig: &amp;tls.Config{InsecureSkipVerify: true},
}
//req := s.ParseReq(data)
req := data.convert2Request()
if req == nil {
	return
}

var resp *http.Response
if data.Redirect {
	c := &amp;http.Client{
		Transport: t,
	}
	resp, err = c.Do(req)
} else {
	resp, err = t.RoundTrip(req)
}

data.updateTry()

r := s.ParseResp(data, resp, err)

updateTry:

func (d *SendData) updateTry() {
	d.Try++
	d.LastSend = time.Now()
}

timeoutDial:

func timeoutDial(timeout int) func(netw, addr string) (net.Conn, error) {
    if timeout &lt;= 0 {
	    timeout = 10
	}
	return func(netw, addr string) (net.Conn, error) {
		deadline := time.Now().Add(time.Duration(timeout) * time.Second)
		c, err := net.DialTimeout(netw, addr, time.Second*time.Duration(timeout+5))
		if err != nil {
			return nil, err
		}
		c.SetDeadline(deadline)
		return c, nil
	}
}

and My dealing with response is:

func (s *Sender) ParseResp(data SendData, resp *http.Response, err error) (r Resp) {
	r = Resp{URL: data.URL}
	if err != nil {
		r.Err = err.Error()
	} else {
		r.HttpCode = resp.StatusCode
		r.Header = resp.Header
		r.URL = resp.Request.URL.String()
		defer resp.Body.Close()
		// we just read part of response and log it.
		reader := bufio.NewReader(resp.Body)
		buf := make([]byte, bytes.MinRead) // 512 byte
		for len(r.Body) &lt; 1024 {           // max 1k
			var n int
			if n, _ = reader.Read(buf); n == 0 {
				break
			}
			r.Body += string(buf[:n])
		}
	}
	return
}

I also found setting in /etc/sysctl.conf which can make the problem happen less frequently:

net.core.somaxconn = 65535
net.netfilter.nf_conntrack_max = 655350
net.netfilter.nf_conntrack_tcp_timeout_established = 1200

I need help for solving this problem.

It seems like this but I don't see any solution https://bugs.launchpad.net/juju-core/+bug/1307434

答案1

得分: 1

更明确地说,Not_a_Golfer和OneOfOne所说的是,当你完成响应后,你需要关闭已经打开的连接(通过Body字段,它是一个io.ReadCloser)。所以基本上,一个简单的方法是改变与发起http请求相关的代码:

var resp *http.Response
if data.Redirect {
    c := &http.Client{
        Transport: t,
    }
    resp, err = c.Do(req)
} else {
    resp, err = t.RoundTrip(req)
}
if err == nil {
    defer resp.Body.Close() // 我们需要关闭连接
}
英文:

To more explicitly state what Not_a_Golfer and OneOfOne have said, when you're done with the response, you need to close the connection which has been left open (through the Body field which is an io.ReadCloser). So basically, one simple though would be to change the code pertaining to making an http request to:

var resp *http.Response
if data.Redirect {
    c := &amp;http.Client{
        Transport: t,
    }
    resp, err = c.Do(req)
} else {
    resp, err = t.RoundTrip(req)
}
if err == nil {
    defer resp.Body.Close() // we need to close the connection
}

答案2

得分: 0

没有看到timeoutDial的代码,我猜测你在使用完连接后没有关闭它。

英文:

Without seeing the code to timeoutDial, my wild guess is that you don't close the connection when you're done with it.

huangapple
  • 本文由 发表于 2015年2月16日 00:07:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/28528060.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定