如何将URL的开头添加到Colly链接列表中

huangapple go评论84阅读模式
英文:

How to add the start of a url to a colly link list

问题

我对Go语言还比较新,正在尝试使用colly来爬取多个网页。其中两个页面的链接不完整,以下是代码和输出:

func PaloNet() {
	c := colly.NewCollector(
		colly.AllowedDomains("security.paloaltonetworks.com"),
	)

	c.OnHTML(".list", func(e *colly.HTMLElement) {
		PaloNetlinks := e.ChildAttrs("a", "href")
		fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
	})

	c.Visit("https://security.paloaltonetworks.com/")
}

输出结果:

[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]

如你所见,链接缺少了"https://security.paloaltonetworks.com/"部分。添加链接开头的最佳方法是什么?

英文:

I'm somewhat new to go and am trying to scrape several webpages using colly. Two of the pages have incomplete links, the below is the code and output

func PaloNet() {

	c := colly.NewCollector(
		colly.AllowedDomains("security.paloaltonetworks.com"),
	)

	c.OnHTML(".list", func(e *colly.HTMLElement) {
		PaloNetlinks := e.ChildAttrs("a", "href")
		fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
	})

	c.Visit("https://security.paloaltonetworks.com/")

}

Output:

[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]

As you can see the links are missing the 'https://security.paloaltonetworks.com/' section. What would be the best way to add the start of the link

答案1

得分: 1

你可以这样做:

func PaloNet() {
	visitUrl := "https://security.paloaltonetworks.com"
	urls := []string{}

	c := colly.NewCollector(
		colly.AllowedDomains("security.paloaltonetworks.com"),
	)

	c.OnHTML(".list", func(e *colly.HTMLElement) {
		PaloNetlinks := e.ChildAttrs("a", "href")

		for i := 0; i < len(PaloNetlinks); i++ {
			urls = append(urls, visitUrl+PaloNetlinks[i])
		}

		fmt.Println("\n\n PaloAlto Security: \n\n", urls)
	})

	c.Visit(visitUrl)
}

希望对你有帮助!

英文:

you can do it like this

func PaloNet() {
visitUrl := &quot;https://security.paloaltonetworks.com&quot;
urls := []string{}

c := colly.NewCollector(
	colly.AllowedDomains(&quot;security.paloaltonetworks.com&quot;),
)

c.OnHTML(&quot;.list&quot;, func(e *colly.HTMLElement) {
	PaloNetlinks := e.ChildAttrs(&quot;a&quot;, &quot;href&quot;)

	for i := 0; i &lt; len(PaloNetlinks); i++ {
		urls = append(urls, visitUrl+PaloNetlinks[i])
	}

	fmt.Println(&quot;\n\n PaloAlto Security: \n\n&quot;, urls)
})

c.Visit(visitUrl)
}

huangapple
  • 本文由 发表于 2022年11月22日 18:55:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/74531470.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定