英文:
Go Colly not returning any data from website
问题
我正在尝试使用Go语言制作一个简单的网络爬虫,但是我似乎无法从colly中获得最简单的功能。我从colly文档中获取了基本示例,虽然它在他们使用的hackernews.org网站上可以工作,但在我尝试爬取的网站上却无法工作。我尝试了多个url的迭代,包括使用https://、www.、以及在末尾加上/等等,但似乎都不起作用。我尝试使用Python中的Beautiful Soup爬取相同的网站,并成功获取了所有内容,所以我知道这个网站是可以被爬取的。感谢任何帮助。
package main
import (
"fmt"
"github.com/gocolly/colly"
)
// 主函数
func main() {
/* 实例化colly */
c := colly.NewCollector(
colly.AllowedDomains("www.bjjheroes.com/"),
)
// 对于每个具有href属性的a元素,调用回调函数
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
fmt.Printf("找到链接:%q \n", e.Text)
})
c.Visit("www.bjjheroes.com/a-z-bjj-fighters-list")
}
英文:
I am trying to make a simple web scraper in go and I can't seem to get the most simple functionality from colly. I took the basic example from the colly docs and while it worked with the hackernews.org site they used it isn't working with the site I am trying to scrape. I tried several iterations of the url ie with https://, www. , with / at the end etc and nothing seems to work. I tried scraping the same site with beatiful soup in python and got everything so i know the site can be scraped. Any help is appreciated. Thanks.
package main
import (
"fmt"
"github.com/gocolly/colly"
)
// main function
func main() {
/* instatiate colly */
c := colly.NewCollector(
colly.AllowedDomains("www.bjjheroes.com/"),
)
// On every a element which has href attribute call callback
c.OnHTML("a[href]", func(e *colly.HTMLElement) {
fmt.Printf("Link found: %q \n", e.Text)
})
c.Visit("www.bjjheroes.com/a-z-bjj-fighters-list")
}
答案1
得分: 3
“错误”在于我在允许的域名中需要添加更多的变体,添加了以下内容后,一切都正常工作了:
colly.AllowedDomains(
"www.bjjheroes.com/",
"bjjheroes.com/",
"https://bjjheroes.com/",
"www.bjjheroes.com",
"bjjheroes.com",
"https://bjjheroes.com",
),
英文:
- The "error" was on my part in that the allowed domains needed several more variations, after adding
colly.AllowedDomains(
"www.bjjheroes.com/",
"bjjheroes.com/",
"https://bjjheroes.com/",
"www.bjjheroes.com",
"bjjheroes.com",
"https://bjjheroes.com",
),
everything worked
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论