What can the go-colly library do?

huangapple go评论86阅读模式
英文:

What can the go-colly library do?

问题

go-colly库能够爬取div标签下的所有HTML标签和文本内容吗?如果可以,应该如何实现?我可以获取div标签下的所有文本内容,像这样:

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    text = strings.TrimSpace(e.Text)
})

但是我不知道如何获取div标签下的HTML标签。

英文:

Can the go-colly library crawl all HTML tags and text content under a div tag? If so, how? I can get all texts under a div tag. Like this:

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
			text = strings.TrimSpace(e.Text)
		})

But I dont'know how to get HTML tags under the div tag.

答案1

得分: 1

如果你正在寻找innerHTML,可以通过DOM和使用Html方法(e.DOM.Html())来访问。

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    html, _ := e.DOM.Html()
    log.Println(html)
})

如果你正在寻找找到的元素下的特定标签,可以使用ForEach来实现。第一个参数是选择器,第二个参数是回调函数。回调函数将对与选择器匹配且是e元素的成员的每个元素进行迭代。

更多信息:https://pkg.go.dev/github.com/gocolly/colly@v1.2.0#HTMLElement.ForEach

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
    text := strings.TrimSpace(e.Text)
    log.Println(text)
    e.ForEach("div", func(_ int, el *colly.HTMLElement) {
        text := strings.TrimSpace(e.Text)
        log.Println(text)
    })
})
英文:

If you looking for innerHTML it is accessible by DOM and using Html method (e.DOM.Html()).

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
	html, _ := e.DOM.Html()
	log.Println(html)
})

If you looking for a special tag under the founded element, ForEach could use for this purpose. The first argument is the selector and the second parameter is the callback function. The callback function will iterate for each element that matches the selector and also is a member of the e element.

More information: https://pkg.go.dev/github.com/gocolly/colly@v1.2.0#HTMLElement.ForEach

c.OnHTML("body .post-topic-main .post-topic-des", func(e *colly.HTMLElement) {
	text := strings.TrimSpace(e.Text)
	log.Println(text)
	e.ForEach("div", func(_ int, el *colly.HTMLElement) {
		text := strings.TrimSpace(e.Text)
		log.Println(text)
	})
})

huangapple
  • 本文由 发表于 2022年4月7日 17:38:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/71779764.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定