英文:
Golang HTTP Get Request Not Resolving for some URL
问题
我正在尝试构建一个网站状态检查器。我发现对于一些URL(如https://www.hetzner.com),golang的HTTP GET请求无法解析并且永远挂起,但是使用curl命令可以正常工作。
在Golang中,没有抛出任何错误。它只是在http.Get上挂起。
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Error while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
使用curl命令运行后,我可以得到响应。
curl https://www.hetzner.com
可能的原因是什么?如何解决这个Golang HTTP的问题?
英文:
I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.
Golang
Here there is no error thrown. It just hangs on http.Get
func main() {
resp, err := http.Get("https://www.hetzner.com")
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))}
CURL
I get the response while running following command.
curl https://www.hetzner.com
What may be the reason? And how do I resolve this issue from golang HTTP?
答案1
得分: 1
您的特定情况可以通过指定HTTP User-Agent标头来解决:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
注意:许多其他主机会拒绝来自您的服务器的请求,因为它们在其一侧有一些安全规则。一些想法:
- 空的或类似机器人的User-Agent HTTP标头
- 您的IP地址的位置。例如,美国的在线商店不需要处理来自俄罗斯的请求。
- 您的提供商的自治系统或CIDR。由于居民的大量恶意活动,一些自治系统完全被列入黑名单。
注意2:许多现代网站在其前面具有DDoS保护或CDN系统。如果Cloudflare保护您的目标网站,您的HTTP请求将被阻止,尽管状态代码为200。为了处理这个问题,您需要构建能够渲染基于JavaScript的网站并添加一些脚本来解决验证码的内容。
此外,如果您在短时间内检查大量网站,您的DNS服务器将阻止您,因为它们具有一些内置的速率限制。在这种情况下,您可以查看massdns或类似的解决方案。
英文:
Your specific case can be fixed by specifying HTTP User-Agent Header:
import (
"fmt"
"io"
"net/http"
)
func main() {
client := &http.Client{}
req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")
resp, err := client.Do(req)
if err != nil {
fmt.Println("Error while retrieving site", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
fmt.Println("Eroor while reading response body", err)
}
fmt.Println("RESPONSE", string(body))
}
Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:
- Empty or bot-like User-Agent HTTP header
- Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
- Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.
Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.
Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论