问题

寻找一种简单地获取网页文本的方法，最好不需要使用大量的正则表达式。

只是想先检查一下，以防这种情况已经内置了，或者至少在Go语言中更容易实现。

英文:

Looking for a way to simply get the text of a web page, preferably without having to resort to a bunch of regular expressions.

Just thought I'd check first in case this kind of thing is already built in, or at least easier to do in Go.

答案1

得分: 3

你可以使用go-query。这个库可以像jquery一样用来从HTML文档中提取文本和文档元素。

以下示例取自github页面：

package main

import (
	"fmt"
	"github.com/PuerkitoBio/goquery"
	"log"
)

func ExampleScrape() {
	doc, err := goquery.NewDocument("http://metalsucks.net")
	if err != nil {
		log.Fatal(err)
	}
	doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) {
		band := s.Find("h3").Text()
		title := s.Find("i").Text()
		fmt.Printf("Review %d: %s - %s\n", i, band, title)
	})
}
func main() {
	ExampleScrape()
}

英文:

You could use go-query. This lib can be used like jquery to grep text and doc elements from a html document.

This example is taken from the github page:

package main

import (
	&quot;fmt&quot;
	&quot;github.com/PuerkitoBio/goquery&quot;
	&quot;log&quot;
)

func ExampleScrape() {
	doc, err := goquery.NewDocument(&quot;http://metalsucks.net&quot;)
	if err != nil {
		log.Fatal(err)
	}
	doc.Find(&quot;.reviews-wrap article .review-rhs&quot;).Each(func(i int, s *goquery.Selection) {
		band := s.Find(&quot;h3&quot;).Text()
		title := s.Find(&quot;i&quot;).Text()
		fmt.Printf(&quot;Review %d: %s - %s\n&quot;, i, band, title)
	})
}
func main() {
	ExampleScrape()
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Extract text from html page in Go

问题

答案1

如何在Go中从func main函数中返回？

从net.CIDRMask获取IPv6子网掩码。

使用`go.work`文件在多个测试文件上运行`go test`命令。

无法使通过电子邮件发送的HTML模板居中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论