英文:
Program halts after successive timeout while performing GET request
问题
我正在制作一个爬虫,用于获取HTML、CSS和JS页面。这个爬虫是一个典型的爬虫,使用4个并发的go协程来获取资源。为了学习,我一直在使用3个测试网站。在测试其中两个网站时,爬虫工作正常,并显示程序完成日志。
然而,在第三个网站上,在获取CSS链接时发生了太多的超时,最终导致我的程序停止运行。它能够获取链接,但在连续20多次超时后,程序停止显示日志。基本上它停止了。我不认为这是事件日志控制台的问题。
我需要单独处理超时吗?我没有发布完整的代码,因为它与我正在寻求的概念答案无关。然而,代码大致如下:
for {
site, more := <-sites
if more {
url, err := url.Parse(site)
if err != nil {
continue
}
response, error := http.Get(url.String())
if error != nil {
fmt.Println("There was an error with Get request: ", error.Error())
continue
}
// 爬取函数
}
}
请注意,这只是代码的一部分,缺少了爬取函数的实现部分。
英文:
I'm making a crawler that fetches html, css and js pages. The crawler is a typical one with 4 go-routines running concurrently to fetch the resources. To study, I've been using 3 test sites. The crawler works fine and shows program completion log while testing two of them.
In the 3rd website however, there are too many timeouts happening while fetching css links. This eventually causes my program to stop. It fetches the links but after 20+ successive timeouts, the program stops showing log. Basically it halts. I don't think it's problem with Event log console.
Do I need to handle timeouts separately ? I'm not posting the full code because it won't relate to conceptual answer that I'm seeking. However the code goes something like this :
for {
site, more := <-sites
if more {
url, err := url.Parse(site)
if err != nil {
continue
}
response, error := http.Get(url.String())
if error != nil {
fmt.Println("There was an error with Get request: ", error.Error())
continue
}
// Crawl function
}
}
答案1
得分: 5
http客户端的默认行为是永久阻塞。在创建客户端时设置超时时间:(http://godoc.org/net/http#Client)
func main() {
client := http.Client{
Timeout: time.Second * 30,
}
res, err := client.Get("http://www.google.com")
if err != nil {
panic(err)
}
fmt.Println(res)
}
30秒后,Get
将返回一个错误。
英文:
The default behavior of the http client is to block forever. Set a timeout when you create the client: (http://godoc.org/net/http#Client)
func main() {
client := http.Client{
Timeout: time.Second * 30,
}
res, err := client.Get("http://www.google.com")
if err != nil {
panic(err)
}
fmt.Println(res)
}
After 30 seconds Get
will return an error.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论