Extract text from html page in Go

huangapple go评论93阅读模式
英文:

Extract text from html page in Go

问题

寻找一种简单地获取网页文本的方法,最好不需要使用大量的正则表达式。

只是想先检查一下,以防这种情况已经内置了,或者至少在Go语言中更容易实现。

英文:

Looking for a way to simply get the text of a web page, preferably without having to resort to a bunch of regular expressions.

Just thought I'd check first in case this kind of thing is already built in, or at least easier to do in Go.

答案1

得分: 3

你可以使用go-query。这个库可以像jquery一样用来从HTML文档中提取文本和文档元素。

以下示例取自github页面:

package main

import (
	"fmt"
	"github.com/PuerkitoBio/goquery"
	"log"
)

func ExampleScrape() {
	doc, err := goquery.NewDocument("http://metalsucks.net")
	if err != nil {
		log.Fatal(err)
	}
	doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) {
		band := s.Find("h3").Text()
		title := s.Find("i").Text()
		fmt.Printf("Review %d: %s - %s\n", i, band, title)
	})
}
func main() {
	ExampleScrape()
}
英文:

You could use go-query. This lib can be used like jquery to grep text and doc elements from a html document.

This example is taken from the github page:

package main

import (
	"fmt"
	"github.com/PuerkitoBio/goquery"
	"log"
)

func ExampleScrape() {
	doc, err := goquery.NewDocument("http://metalsucks.net")
	if err != nil {
		log.Fatal(err)
	}
	doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) {
		band := s.Find("h3").Text()
		title := s.Find("i").Text()
		fmt.Printf("Review %d: %s - %s\n", i, band, title)
	})
}
func main() {
	ExampleScrape()
}

huangapple
  • 本文由 发表于 2014年11月18日 08:05:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/26984312.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定