2022年8月30日 16:14:01go评论90阅读模式

英文:

Golang HTTP Get Request Not Resolving for some URL

问题

我正在尝试构建一个网站状态检查器。我发现对于一些URL（如https://www.hetzner.com），golang的HTTP GET请求无法解析并且永远挂起，但是使用curl命令可以正常工作。

在Golang中，没有抛出任何错误。它只是在http.Get上挂起。

func main() {
    resp, err := http.Get("https://www.hetzner.com")
    if err != nil {
        fmt.Println("Error while retrieving site", err)
    }
    defer resp.Body.Close()
    body, err := io.ReadAll(resp.Body)
    if err != nil {
        fmt.Println("Error while reading response body", err)
    }
    fmt.Println("RESPONSE", string(body))
}

使用curl命令运行后，我可以得到响应。

curl https://www.hetzner.com

可能的原因是什么？如何解决这个Golang HTTP的问题？

英文:

I was trying to build some sort of website status checker. I figure out that the golang HTTP get request is not resolved and hung forever for some URL like https://www.hetzner.com. But the same URL works if we do curl.

Golang

Here there is no error thrown. It just hangs on http.Get

func main() {
  resp, err := http.Get(&quot;https://www.hetzner.com&quot;)
  if err != nil {
    	fmt.Println(&quot;Error while retrieving site&quot;, err)
  }
  defer resp.Body.Close()
  body, err := io.ReadAll(resp.Body)
    if err != nil {
	  fmt.Println(&quot;Eroor while reading response body&quot;, err)
  }
  fmt.Println(&quot;RESPONSE&quot;, string(body))}

CURL

I get the response while running following command.

curl https://www.hetzner.com

What may be the reason? And how do I resolve this issue from golang HTTP?

答案1

得分: 1

您的特定情况可以通过指定HTTP User-Agent标头来解决：

import (
	"fmt"
	"io"
	"net/http"
)

func main() {
	client := &http.Client{}

	req, err := http.NewRequest("GET", "https://www.hetzner.com", nil)
	if err != nil {
		fmt.Println("Error while retrieving site", err)
	}

	req.Header.Set("User-Agent", "Golang_Spider_Bot/3.0")

	resp, err := client.Do(req)
	if err != nil {
		fmt.Println("Error while retrieving site", err)
	}

	defer resp.Body.Close()
	body, err := io.ReadAll(resp.Body)
	if err != nil {
		fmt.Println("Eroor while reading response body", err)
	}
	fmt.Println("RESPONSE", string(body))
}

注意：许多其他主机会拒绝来自您的服务器的请求，因为它们在其一侧有一些安全规则。一些想法：

空的或类似机器人的User-Agent HTTP标头
您的IP地址的位置。例如，美国的在线商店不需要处理来自俄罗斯的请求。
您的提供商的自治系统或CIDR。由于居民的大量恶意活动，一些自治系统完全被列入黑名单。

注意2：许多现代网站在其前面具有DDoS保护或CDN系统。如果Cloudflare保护您的目标网站，您的HTTP请求将被阻止，尽管状态代码为200。为了处理这个问题，您需要构建能够渲染基于JavaScript的网站并添加一些脚本来解决验证码的内容。

此外，如果您在短时间内检查大量网站，您的DNS服务器将阻止您，因为它们具有一些内置的速率限制。在这种情况下，您可以查看massdns或类似的解决方案。

英文:

Your specific case can be fixed by specifying HTTP User-Agent Header:

import (
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;net/http&quot;
)

func main() {
	client := &amp;http.Client{}

	req, err := http.NewRequest(&quot;GET&quot;, &quot;https://www.hetzner.com&quot;, nil)
	if err != nil {
		fmt.Println(&quot;Error while retrieving site&quot;, err)
	}

	req.Header.Set(&quot;User-Agent&quot;, &quot;Golang_Spider_Bot/3.0&quot;)

	resp, err := client.Do(req)
	if err != nil {
		fmt.Println(&quot;Error while retrieving site&quot;, err)
	}

	defer resp.Body.Close()
	body, err := io.ReadAll(resp.Body)
	if err != nil {
		fmt.Println(&quot;Eroor while reading response body&quot;, err)
	}
	fmt.Println(&quot;RESPONSE&quot;, string(body))
}

Note: many other hosts will reject requests from your server because of some security rules on their side. Some ideas:

Empty or bot-like User-Agent HTTP header
Location of your IP address. For example, online shops in the USA don't need to handle requests from Russia.
Autonomous System or CIDR of your provider. Some ASNs are completely blackholed because of the enormous malicious activities from their residents.

Note 2: Many modern websites have DDoS protection or CDN systems in front of them. If Cloudflare protects your target website, your HTTP request will be blocked despite the status code 200. To handle this, you need to build something able to render JavaScript-based websites and add some scripts to resolve a captcha.

Also, if you check a considerable amount of websites in a short time, you will be blocked by your DNS servers as they have some inbuild rate limits. In this case, you may want to take a look at massdns or similar solutions.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang HTTP Get 请求无法解析某些 URL。

问题

答案1

golang unencode string to be parsable into json

无法使用Golang从App Engine成功地将有效的JSON数据POST到远程URL。

当在Go函数声明中的参数旁边出现"…"时，它的含义是可变参数。

如何在Go语言中并行运行for循环内的方法？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论