Golang colly爬取错误:请求过多

huangapple go评论101阅读模式
英文:

Golang colly crawling error Too Many Requests

问题

我正在尝试从Google Trends上爬取一些信息。但每次我尝试获取数据时,都会收到"Too Many Requests"的错误。其他网站都没问题。

我的代码:

func Teste(searchTrend string) {

	searchTrend = strings.Trim(searchTrend, " ")
	searchTrend = strings.Replace(searchTrend, " ", "%20", -1)

	linkTrends := ("https://trends.google.com.br/trends/explore?geo=BR&q=" + searchTrend)

	c := colly.NewCollector()

	c.SetRequestTimeout(120 * time.Second)

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Scraping:", r.URL)
	})

	c.OnResponse(func(r *colly.Response) {
		fmt.Println("Status:", r.StatusCode)
	})

	c.OnError(func(r *colly.Response, e error) {
		log.Println("error:", e, r.Request.URL, string(r.Body))
	})

	c.OnHTML("div.widget-template", func(h *colly.HTMLElement) {
		...
	})

	c.Visit(linkTrends)

}

错误信息:
That’s an error.

We're sorry, but you have sent too many requests to us recently. Please try again later. That’s all we know.

英文:

I'm trying to scrape some information from Google Trends. But every time that I try to get some data I receive the error Too Many Requests. Other sites are ok.

My code:

func Teste(searchTrend string) {

	searchTrend = strings.Trim(searchTrend, " ")
	searchTrend = strings.Replace(searchTrend, " ", "%20", -1)

	linkTrends := ("https://trends.google.com.br/trends/explore?geo=BR&q=" + searchTrend)

	c := colly.NewCollector()

	c.SetRequestTimeout(120 * time.Second)

	 c.OnRequest(func(r *colly.Request) {
	 	fmt.Println("Scraping:", r.URL)
	 })

	 c.OnResponse(func(r *colly.Response) {
	 	fmt.Println("Status:", r.StatusCode)
	 })

	c.OnError(func(r *colly.Response, e error) {
		log.Println("error:", e, r.Request.URL, string(r.Body))
	})

	c.OnHTML("div.widget-template", func(h *colly.HTMLElement) {
      ...
	})

	c.Visit(linkTrends)

}

The error:
<ins>That’s an error.</ins><p>We're sorry, but you have sent too many requests to us recently. Please try again later. <ins>That’s all we know.</ins>

答案1

得分: 1

你的代码看起来不错。然而,Google通常会阻止来自不受信任来源的爬取活动,这就是为什么它返回429 Too Many Requests错误。你可以找一些工具,允许你注册并爬取它们的趋势数据。一个例子是这个。为了检查你的解决方案是否有效,尝试爬取另一个网站,应该能按预期工作。

英文:

Your code seems good. However, Google usually blocks crawling activities from untrusted sources, that's why it returns 429 Too Many Requests.
You can find some tools out there that allow you to sign-up and crawl their trends. An example can be this.
To check the goodness of your solution, try to scrape another website and it should work as expected.

huangapple
  • 本文由 发表于 2022年11月18日 02:41:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/74480550.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定