英文:
How to add the start of a url to a colly link list
问题
我对Go语言还比较新,正在尝试使用colly来爬取多个网页。其中两个页面的链接不完整,以下是代码和输出:
func PaloNet() {
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
})
c.Visit("https://security.paloaltonetworks.com/")
}
输出结果:
[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]
如你所见,链接缺少了"https://security.paloaltonetworks.com/"部分。添加链接开头的最佳方法是什么?
英文:
I'm somewhat new to go and am trying to scrape several webpages using colly. Two of the pages have incomplete links, the below is the code and output
func PaloNet() {
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
})
c.Visit("https://security.paloaltonetworks.com/")
}
Output:
[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]
As you can see the links are missing the 'https://security.paloaltonetworks.com/' section. What would be the best way to add the start of the link
答案1
得分: 1
你可以这样做:
func PaloNet() {
visitUrl := "https://security.paloaltonetworks.com"
urls := []string{}
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
for i := 0; i < len(PaloNetlinks); i++ {
urls = append(urls, visitUrl+PaloNetlinks[i])
}
fmt.Println("\n\n PaloAlto Security: \n\n", urls)
})
c.Visit(visitUrl)
}
希望对你有帮助!
英文:
you can do it like this
func PaloNet() {
visitUrl := "https://security.paloaltonetworks.com"
urls := []string{}
c := colly.NewCollector(
colly.AllowedDomains("security.paloaltonetworks.com"),
)
c.OnHTML(".list", func(e *colly.HTMLElement) {
PaloNetlinks := e.ChildAttrs("a", "href")
for i := 0; i < len(PaloNetlinks); i++ {
urls = append(urls, visitUrl+PaloNetlinks[i])
}
fmt.Println("\n\n PaloAlto Security: \n\n", urls)
})
c.Visit(visitUrl)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论