程序在执行GET请求时连续超时后停止运行。

huangapple go评论69阅读模式
英文:

Program halts after successive timeout while performing GET request

问题

我正在制作一个爬虫,用于获取HTML、CSS和JS页面。这个爬虫是一个典型的爬虫,使用4个并发的go协程来获取资源。为了学习,我一直在使用3个测试网站。在测试其中两个网站时,爬虫工作正常,并显示程序完成日志。

然而,在第三个网站上,在获取CSS链接时发生了太多的超时,最终导致我的程序停止运行。它能够获取链接,但在连续20多次超时后,程序停止显示日志。基本上它停止了。我不认为这是事件日志控制台的问题。

我需要单独处理超时吗?我没有发布完整的代码,因为它与我正在寻求的概念答案无关。然而,代码大致如下:

for {
    site, more := <-sites
    if more {
        url, err := url.Parse(site)
        if err != nil {
            continue
        }
        response, error := http.Get(url.String())

        if error != nil {
            fmt.Println("There was an error with Get request: ", error.Error())
            continue
        }

        // 爬取函数
    }
}

请注意,这只是代码的一部分,缺少了爬取函数的实现部分。

英文:

I'm making a crawler that fetches html, css and js pages. The crawler is a typical one with 4 go-routines running concurrently to fetch the resources. To study, I've been using 3 test sites. The crawler works fine and shows program completion log while testing two of them.

In the 3rd website however, there are too many timeouts happening while fetching css links. This eventually causes my program to stop. It fetches the links but after 20+ successive timeouts, the program stops showing log. Basically it halts. I don't think it's problem with Event log console.

Do I need to handle timeouts separately ? I'm not posting the full code because it won't relate to conceptual answer that I'm seeking. However the code goes something like this :

for {
    site, more := &lt;-sites
	if more {
		url, err := url.Parse(site)
		if err != nil {
			continue
		}
		response, error := http.Get(url.String())

		if error != nil {
			fmt.Println(&quot;There was an error with Get request: &quot;, error.Error())
			continue
		}

        // Crawl function
    }
}

答案1

得分: 5

http客户端的默认行为是永久阻塞。在创建客户端时设置超时时间:(http://godoc.org/net/http#Client)

func main() {
    client := http.Client{
        Timeout: time.Second * 30,
    }
    res, err := client.Get("http://www.google.com")
    if err != nil {
        panic(err)
    }
    fmt.Println(res)
}

30秒后,Get将返回一个错误。

英文:

The default behavior of the http client is to block forever. Set a timeout when you create the client: (http://godoc.org/net/http#Client)

func main() {
	client := http.Client{
		Timeout: time.Second * 30,
	}
	res, err := client.Get(&quot;http://www.google.com&quot;)
	if err != nil {
		panic(err)
	}
	fmt.Println(res)
}

After 30 seconds Get will return an error.

huangapple
  • 本文由 发表于 2015年8月14日 19:58:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/32009531.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定