英文:
Golang colly crawling error Too Many Requests
问题
我正在尝试从Google Trends上爬取一些信息。但每次我尝试获取数据时,都会收到"Too Many Requests"的错误。其他网站都没问题。
我的代码:
func Teste(searchTrend string) {
searchTrend = strings.Trim(searchTrend, " ")
searchTrend = strings.Replace(searchTrend, " ", "%20", -1)
linkTrends := ("https://trends.google.com.br/trends/explore?geo=BR&q=" + searchTrend)
c := colly.NewCollector()
c.SetRequestTimeout(120 * time.Second)
c.OnRequest(func(r *colly.Request) {
fmt.Println("Scraping:", r.URL)
})
c.OnResponse(func(r *colly.Response) {
fmt.Println("Status:", r.StatusCode)
})
c.OnError(func(r *colly.Response, e error) {
log.Println("error:", e, r.Request.URL, string(r.Body))
})
c.OnHTML("div.widget-template", func(h *colly.HTMLElement) {
...
})
c.Visit(linkTrends)
}
错误信息:
That’s an error.
We're sorry, but you have sent too many requests to us recently. Please try again later. That’s all we know.
英文:
I'm trying to scrape some information from Google Trends. But every time that I try to get some data I receive the error Too Many Requests. Other sites are ok.
My code:
func Teste(searchTrend string) {
searchTrend = strings.Trim(searchTrend, " ")
searchTrend = strings.Replace(searchTrend, " ", "%20", -1)
linkTrends := ("https://trends.google.com.br/trends/explore?geo=BR&q=" + searchTrend)
c := colly.NewCollector()
c.SetRequestTimeout(120 * time.Second)
c.OnRequest(func(r *colly.Request) {
fmt.Println("Scraping:", r.URL)
})
c.OnResponse(func(r *colly.Response) {
fmt.Println("Status:", r.StatusCode)
})
c.OnError(func(r *colly.Response, e error) {
log.Println("error:", e, r.Request.URL, string(r.Body))
})
c.OnHTML("div.widget-template", func(h *colly.HTMLElement) {
...
})
c.Visit(linkTrends)
}
The error:
<ins>That’s an error.</ins><p>We're sorry, but you have sent too many requests to us recently. Please try again later. <ins>That’s all we know.</ins>
答案1
得分: 1
你的代码看起来不错。然而,Google通常会阻止来自不受信任来源的爬取活动,这就是为什么它返回429 Too Many Requests错误。你可以找一些工具,允许你注册并爬取它们的趋势数据。一个例子是这个。为了检查你的解决方案是否有效,尝试爬取另一个网站,应该能按预期工作。
英文:
Your code seems good. However, Google usually blocks crawling activities from untrusted sources, that's why it returns 429 Too Many Requests.
You can find some tools out there that allow you to sign-up and crawl their trends. An example can be this.
To check the goodness of your solution, try to scrape another website and it should work as expected.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论