问题

我正在尝试从Google Trends上爬取一些信息。但每次我尝试获取数据时，都会收到"Too Many Requests"的错误。其他网站都没问题。

我的代码：

func Teste(searchTrend string) {

	searchTrend = strings.Trim(searchTrend, " ")
	searchTrend = strings.Replace(searchTrend, " ", "%20", -1)

	linkTrends := ("https://trends.google.com.br/trends/explore?geo=BR&q=" + searchTrend)

	c := colly.NewCollector()

	c.SetRequestTimeout(120 * time.Second)

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Scraping:", r.URL)
	})

	c.OnResponse(func(r *colly.Response) {
		fmt.Println("Status:", r.StatusCode)
	})

	c.OnError(func(r *colly.Response, e error) {
		log.Println("error:", e, r.Request.URL, string(r.Body))
	})

	c.OnHTML("div.widget-template", func(h *colly.HTMLElement) {
		...
	})

	c.Visit(linkTrends)

}

错误信息：
That’s an error.

We're sorry, but you have sent too many requests to us recently. Please try again later. That’s all we know.

英文:

I'm trying to scrape some information from Google Trends. But every time that I try to get some data I receive the error Too Many Requests. Other sites are ok.

My code:

func Teste(searchTrend string) {

	searchTrend = strings.Trim(searchTrend, &quot; &quot;)
	searchTrend = strings.Replace(searchTrend, &quot; &quot;, &quot;%20&quot;, -1)

	linkTrends := (&quot;https://trends.google.com.br/trends/explore?geo=BR&amp;q=&quot; + searchTrend)

	c := colly.NewCollector()

	c.SetRequestTimeout(120 * time.Second)

	 c.OnRequest(func(r *colly.Request) {
	 	fmt.Println(&quot;Scraping:&quot;, r.URL)
	 })

	 c.OnResponse(func(r *colly.Response) {
	 	fmt.Println(&quot;Status:&quot;, r.StatusCode)
	 })

	c.OnError(func(r *colly.Response, e error) {
		log.Println(&quot;error:&quot;, e, r.Request.URL, string(r.Body))
	})

	c.OnHTML(&quot;div.widget-template&quot;, func(h *colly.HTMLElement) {
      ...
	})

	c.Visit(linkTrends)

}

The error:
<ins>That’s an error.</ins><p>We're sorry, but you have sent too many requests to us recently. Please try again later. <ins>That’s all we know.</ins>

答案1

得分: 1

你的代码看起来不错。然而，Google通常会阻止来自不受信任来源的爬取活动，这就是为什么它返回429 Too Many Requests错误。你可以找一些工具，允许你注册并爬取它们的趋势数据。一个例子是这个。为了检查你的解决方案是否有效，尝试爬取另一个网站，应该能按预期工作。

英文:

Your code seems good. However, Google usually blocks crawling activities from untrusted sources, that's why it returns 429 Too Many Requests.
You can find some tools out there that allow you to sign-up and crawl their trends. An example can be this.
To check the goodness of your solution, try to scrape another website and it should work as expected.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang colly爬取错误：请求过多

问题

答案1

连接字节数组

对于 golang 中的切片（slice）的 cap 属性感到困惑。

如何使用Golang设计一个多文件处理器？

结构体中切片的初始化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论