GoLang – XmlPath Selectors with HTML

huangapple go评论75阅读模式
英文:

GoLang - XmlPath Selectors with HTML

问题

我正在查看这里的文档示例,但它只是在一个XML树上进行迭代,而不是HTML。因此,我还是有些困惑。

例如,如果我想通过名称在head标签中找到特定的meta标签,似乎是不行的?相反,我需要按照head标签中的顺序来找到它。在这种情况下,我想要第8个meta标签,我猜应该是:

headTag, err := getByID(xmlroot, "/head/meta[8]/")

但是,这里使用了一个用于标签名称的getByID函数 - 我不认为这会起作用。有哪些完整的"getBy..."命令列表?

然后,问题是,如何访问meta标签的内容?文档只提供了内部标签节点内容的示例。然而,这个示例是否有效?:

resp.Query = extractValue(headTag, @content)

@选择器让我感到困惑,这对这种情况是否合适?

换句话说:

  1. 是否有一个正确的HTML示例可用?
  2. 是否有正确的ID、标签等选择器列表?
  3. 可以通过名称找到标签,并从其内部内容标签中提取内容吗?

非常感谢!

英文:

I am looking at the documented example here, but it is iterating purely over an XML tree, and not HTML. Therefore, I am still partly confused.

For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? Instead, I need to find it by the order it is in the head tag. In this case, I want the 8th meta tag, which I assume is:

> headTag, err := getByID(xmlroot, "/head/meta[8]/")

But of course, this is using a getByID function for a tag name - which I don't believe will work. What is the full list of "getBy..." commands?

Then, the problem is, how do I access the meta tag's contents? The documentation only provides examples for the inner tag node content. However, will this example work?:

> resp.Query = extractValue(headTag, @content)

The @ selector confuses me, is this appropriate for this case?

In other words:

  1. Is there a proper HTML example available?
  2. Is there a list of correct selectors for IDs, Tags, etc?
  3. Can Tags be found by name, and content extracted from its inner content tag?

Thank you very much!

答案1

得分: 7

我知道这个回答有点晚,但我仍然想推荐一个基于XPath表达式的简单而强大的*htmlquery*包。

以下是基于@Time-Cooper示例的代码。

package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadURL("https://example.com")
	if err != nil {
		panic(err)
	}
	s := htmlquery.Find(doc, "//meta[@name='viewport']")
	if len(s) == 0 {
		fmt.Println("could not find viewpoint")
		return
	}
	fmt.Println(htmlquery.SelectAttr(s[0], "content"))

	// alternative method,but simple more.
	s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
	fmt.Println(htmlquery.InnerText(s2))
}
英文:

I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*.

The below code based on @Time-Cooper example.
<!-- language: golang -->

package main

import (
	&quot;fmt&quot;

	&quot;github.com/antchfx/htmlquery&quot;
)

func main() {
	doc, err := htmlquery.LoadURL(&quot;https://example.com&quot;)
	if err != nil {
		panic(err)
	}
	s := htmlquery.Find(doc, &quot;//meta[@name=&#39;viewport&#39;]&quot;)
	if len(s) == 0 {
		fmt.Println(&quot;could not find viewpoint&quot;)
		return
	}
	fmt.Println(htmlquery.SelectAttr(s[0], &quot;content&quot;))

	// alternative method,but simple more.
	s2 := htmlquery.FindOne(doc, &quot;//meta[@name=&#39;viewport&#39;]/@content&quot;)
	fmt.Println(htmlquery.InnerText(s2))
}

答案2

得分: 5

XPath在这里似乎不太适用;你应该使用goquery,它专门用于处理HTML。

以下是一个示例:

package main

import (
	"fmt"

	"github.com/PuerkitoBio/goquery"
)

func main() {
	doc, err := goquery.NewDocument("https://example.com")
	if err != nil {
		panic(err)
	}
	s := doc.Find(`html > head > meta[name="viewport"]`)
	if s.Length() == 0 {
		fmt.Println("找不到视口")
		return
	}
	fmt.Println(s.Eq(0).AttrOr("content", ""))
}
英文:

XPath does not seem suitable here; you should be using goquery, which is designed for HTML.

Here is an example:

package main

import (
	&quot;fmt&quot;
	
	&quot;github.com/PuerkitoBio/goquery&quot;
)

func main() {
	doc, err := goquery.NewDocument(&quot;https://example.com&quot;)
	if err != nil {
		panic(err)
	}
	s := doc.Find(`html &gt; head &gt; meta[name=&quot;viewport&quot;]`)
	if s.Length() == 0 {
		fmt.Println(&quot;could not find viewpoint&quot;)
		return
	}
	fmt.Println(s.Eq(0).AttrOr(&quot;content&quot;, &quot;&quot;))
}

huangapple
  • 本文由 发表于 2017年2月9日 00:13:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/42118194.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定