英文:
Too many redirects, but through what route?
问题
我有一个基于goquery的简单网络爬虫/蜘蛛,它使用net/http。它工作得很好,直到我遇到一个有太多重定向的网站。
Get http://www.example.com/some/path.html: 在10次重定向后停止
但是为什么会这样?它是重定向到自身了吗?它把我扔进了某个蜘蛛监狱吗?我想知道我被重定向到了哪些URL,以及按照什么顺序。
引发错误的函数似乎知道这一点,因为它在检查请求切片的长度,但我不想自己编辑net/http包。
这是来自http://golang.org/src/pkg/net/http/client.go的那个函数:
func defaultCheckRedirect(req *Request, via []*Request) error {
if len(via) >= 10 {
return errors.New("stopped after 10 redirects")
}
return nil
}
英文:
I've got a simple web scraper/spider based on goquery, which in turn uses net/http. It works great, until I hit a website with too many redirects.
>Get http://www.example.com/some/path.html: stopped after 10 redirects
But why? Did it redirect to itself? Did it throw me into some spider jail? I want to know to what url's I got redirected, and in what order.
The function giving the error seems to know this, since it's checking the length of a slice of requests, but I don't really want to edit the net/http package myself.
Here's that function from http://golang.org/src/pkg/net/http/client.go
func defaultCheckRedirect(req *Request, via []*Request) error {
if len(via) >= 10 {
return errors.New("stopped after 10 redirects")
}
return nil
}
答案1
得分: 2
你可以将自己的函数传递给http.Client
,例如:
client := &http.Client{
CheckRedirect: func(req *http.Request, via []*http.Request) error {
log.Println("redirect", req.URL)
if len(via) >= 10 {
return errors.New("stopped after 10 redirects")
}
return nil
},
}
这段代码中,通过在http.Client
中设置CheckRedirect
字段为一个函数,你可以自定义重定向行为。在这个例子中,函数会在每次重定向时打印重定向的URL,并且如果重定向次数超过10次,会返回一个错误。
英文:
You can pass your own function to http.Client
, for example:
client := &http.Client{
CheckRedirect: func(req *Request, via []*Request) error {
log.Println("redirect", req.URL)
if len(via) >= 10 {
return errors.New("stopped after 10 redirects")
}
return nil
},
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论