英文:
How to avoid some sites rejecting HTTP get using go
问题
我们有一个脚本,每天检查我们数据库记录中的所有网页链接(用户希望在链接过期时收到通知)。
有几个网站在通过Web浏览器从此IP地址访问时正常工作,但是通过GO获取时,它们要么在完成请求之前断开连接,要么返回HTTP授权被拒绝的消息。
我猜测某种防火墙(F5)正在过滤/阻止请求。即使我将HTTP请求更改为使用常见的用户代理,这种情况仍然发生。我们该怎么做才能确保GO请求看起来像标准的浏览器?
func fetch_url(url string, d time.Duration) (int, error) {
client := &http.Client{
Timeout: d,
}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return 0, err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53")
resp, err := client.Do(req)
if err != nil {
return 0, err
}
status := resp.StatusCode
resp.Body.Close()
return status, nil
}
英文:
We have a script that on a daily basis checks all of the web links in all of our database records (the users want notifications when a link becomes out of date).
There are a couple of sites that work fine through a web browser from this IP address, but when fetched through GO, they either disconnect before completing the request or return a HTTP authorisation denied message.
I am assuming some sort of firewall (F5) is filtering/blocking the request. This occurs even when I change the HTTP request to use a common user agent. What can we do to ensure a GO request looks like a standard browser?
func fetch_url(url string, d time.Duration) (int, error) {
client := &http.Client{
Timeout: d,
}
req, err := http.NewRequest("GET", url, nil)
if err != nil {
return 0, err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (iPad; CPU OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53")
resp, err := client.Do(req)
if err != nil {
return 0, err
}
status := resp.StatusCode
resp.Body.Close()
return status, nil
}
答案1
得分: 3
尝试匹配来自您的Web浏览器的请求的确切标头,以消除其他因素。智能防火墙可以根据外观来区分Web浏览器和机器人。
请注意,Go HTTP客户端只发送一个最小的HTTP请求:
GET /foo HTTP/1.1
Host: localhost:3030
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
而Web浏览器则更加健谈:
GET /foo HTTP/1.1
Host: localhost:3030
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
英文:
Try matching the exact headers from a request from your web browser to eliminate other factors. A smart firewall could have heuristics on what looks like a web browser versus a robot.
Notice that the go http client sends only a minimal HTTP request:
<!-- language: lang-txt -->
GET /foo HTTP/1.1
Host: localhost:3030
User-Agent: Go 1.1 package http
Accept-Encoding: gzip
Whereas a web browser is more chatty:
<!-- language: lang-txt -->
GET /foo HTTP/1.1
Host: localhost:3030
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.89 Safari/537.36
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论