Extract meta description field with goquery

huangapple go评论74阅读模式
英文:

Extract meta description field with goquery

问题

我正在使用goquery包从网页中提取信息。请看下面的代码。运行函数后的结果是:

描述字段:text/html; charset=iso-8859-15
描述字段:width=device-width
描述字段:THIS IS THE TEXT I WANT TO EXTRACT

我已经接近成功了,但是我只想获取name为'description'的meta字段。不幸的是,我无法弄清楚如何将这个额外条件添加到我的代码中。

func ExampleScrapeDescription() {
    htmlCode :=
        `<!doctype html>
<html lang="NL">
    <head>
        <meta http-equiv="content-type" content="text/html; charset=iso-8859-15">
        <meta name="viewport" content="width=device-width">
        <meta name="description" content="THIS IS THE TEXT I WANT TO EXTRACT">
        <title>page title</title>
    </head>
    <body class="fixedHeader">
        page body
    </body>
</html>`

    doc, err := goquery.NewDocumentFromReader(strings.NewReader((htmlCode)))
    if err != nil {
        log.Fatal(err)
    }

    doc.Find("meta").Each(func(i int, s *goquery.Selection) {
        name, _ := s.Attr("name")
        if name == "description" {
            description, _ := s.Attr("content")
            fmt.Printf("Description field: %s\n", description)
        }
    })
}
英文:

I'm using the goquery package to extract pieces of information from a webpage. Please see my code below. The outcome when after running the function is:

Description field: text/html; charset=iso-8859-15
Description field: width=device-width
Description field: THIS IS THE TEXT I WANT TO EXTRACT

I'm almost there, however I only want to get the meta field where the name == 'description'. Unfortunately I can't figure out how to add this extra condition to my code.

func ExampleScrapeDescription() {
	htmlCode :=
		`&lt;!doctype html&gt;
&lt;html lang=&quot;NL&quot;&gt;
	&lt;head&gt;
		&lt;meta http-equiv=&quot;content-type&quot; content=&quot;text/html; charset=iso-8859-15&quot;&gt;
		&lt;meta name=&quot;viewport&quot; content=&quot;width=device-width&quot;&gt;
		&lt;meta name=&quot;description&quot; content=&quot;THIS IS THE TEXT I WANT TO EXTRACT&quot;&gt;
		&lt;title&gt;page title&lt;/title&gt;
	&lt;/head&gt;
	&lt;body class=&quot;fixedHeader&quot;&gt;
		page body
	&lt;/body&gt;
&lt;/html&gt;`

	doc, err := goquery.NewDocumentFromReader(strings.NewReader((htmlCode)))
	if err != nil {
		log.Fatal(err)
	}

	doc.Find(&quot;meta&quot;).Each(func(i int, s *goquery.Selection) {
		description, _ := s.Attr(&quot;content&quot;)
		fmt.Printf(&quot;Description field: %s\n&quot;, description)
	})
}

答案1

得分: 11

只需检查name属性的值是否与"description"匹配:

doc.Find("meta").Each(func(i int, s *goquery.Selection) {
    if name, _ := s.Attr("name"); name == "description" {
        description, _ := s.Attr("content")
        fmt.Printf("Description field: %s\n", description)
    }
})

如果你想以不区分大小写的方式比较name属性的值,可以使用strings.EqualFold()

if name, _ := s.Attr("name"); strings.EqualFold(name, "description") {
    // 继续提取和使用description的内容
}
英文:

Just examine the value of the name attribute whether it matches &quot;description&quot;:

doc.Find(&quot;meta&quot;).Each(func(i int, s *goquery.Selection) {
    if name, _ := s.Attr(&quot;name&quot;); name == &quot;description&quot; {
        description, _ := s.Attr(&quot;content&quot;)
        fmt.Printf(&quot;Description field: %s\n&quot;, description)
    }
})

You may want to compare the value of the name attribute in a case insensitive manner, for that you can use strings.EqualFold():

if name, _ := s.Attr(&quot;name&quot;); strings.EqualFold(name, &quot;description&quot;) {
    // proceed to extract and use the content of description
}

答案2

得分: 3

attr, _ := doc.Find("meta[name='description']").Attr("content")
attr_= doc.Find(“meta[name ='description']”)。Attr(“content”)
英文:
	attr, _ := doc.Find(&quot;meta[name=&#39;description&#39;]&quot;).Attr(&quot;content&quot;)

huangapple
  • 本文由 发表于 2015年5月27日 14:43:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/30474991.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定