2017年2月9日 00:13:25go评论88阅读模式

英文:

GoLang - XmlPath Selectors with HTML

问题

我正在查看这里的文档示例，但它只是在一个XML树上进行迭代，而不是HTML。因此，我还是有些困惑。

例如，如果我想通过名称在head标签中找到特定的meta标签，似乎是不行的？相反，我需要按照head标签中的顺序来找到它。在这种情况下，我想要第8个meta标签，我猜应该是：

headTag, err := getByID(xmlroot, "/head/meta[8]/")

但是，这里使用了一个用于标签名称的getByID函数 - 我不认为这会起作用。有哪些完整的"getBy..."命令列表？

然后，问题是，如何访问meta标签的内容？文档只提供了内部标签节点内容的示例。然而，这个示例是否有效？：

resp.Query = extractValue(headTag, @content)

@选择器让我感到困惑，这对这种情况是否合适？

换句话说：

是否有一个正确的HTML示例可用？
是否有正确的ID、标签等选择器列表？
可以通过名称找到标签，并从其内部内容标签中提取内容吗？

非常感谢！

英文:

I am looking at the documented example here, but it is iterating purely over an XML tree, and not HTML. Therefore, I am still partly confused.

For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? Instead, I need to find it by the order it is in the head tag. In this case, I want the 8th meta tag, which I assume is:

> headTag, err := getByID(xmlroot, "/head/meta[8]/")

But of course, this is using a getByID function for a tag name - which I don't believe will work. What is the full list of "getBy..." commands?

Then, the problem is, how do I access the meta tag's contents? The documentation only provides examples for the inner tag node content. However, will this example work?:

> resp.Query = extractValue(headTag, @content)

The @ selector confuses me, is this appropriate for this case?

In other words:

Is there a proper HTML example available?
Is there a list of correct selectors for IDs, Tags, etc?
Can Tags be found by name, and content extracted from its inner content tag?

Thank you very much!

答案1

得分: 7

我知道这个回答有点晚，但我仍然想推荐一个基于XPath表达式的简单而强大的*htmlquery*包。

以下是基于@Time-Cooper示例的代码。

package main

import (
	"fmt"

	"github.com/antchfx/htmlquery"
)

func main() {
	doc, err := htmlquery.LoadURL("https://example.com")
	if err != nil {
		panic(err)
	}
	s := htmlquery.Find(doc, "//meta[@name='viewport']")
	if len(s) == 0 {
		fmt.Println("could not find viewpoint")
		return
	}
	fmt.Println(htmlquery.SelectAttr(s[0], "content"))

	// alternative method,but simple more.
	s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
	fmt.Println(htmlquery.InnerText(s2))
}

英文:

I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*.

The below code based on @Time-Cooper example.

package main

import (
	&quot;fmt&quot;

	&quot;github.com/antchfx/htmlquery&quot;
)

func main() {
	doc, err := htmlquery.LoadURL(&quot;https://example.com&quot;)
	if err != nil {
		panic(err)
	}
	s := htmlquery.Find(doc, &quot;//meta[@name=&#39;viewport&#39;]&quot;)
	if len(s) == 0 {
		fmt.Println(&quot;could not find viewpoint&quot;)
		return
	}
	fmt.Println(htmlquery.SelectAttr(s[0], &quot;content&quot;))

	// alternative method,but simple more.
	s2 := htmlquery.FindOne(doc, &quot;//meta[@name=&#39;viewport&#39;]/@content&quot;)
	fmt.Println(htmlquery.InnerText(s2))
}

答案2

得分: 5

XPath在这里似乎不太适用；你应该使用goquery，它专门用于处理HTML。

以下是一个示例：

package main

import (
	"fmt"

	"github.com/PuerkitoBio/goquery"
)

func main() {
	doc, err := goquery.NewDocument("https://example.com")
	if err != nil {
		panic(err)
	}
	s := doc.Find(`html > head > meta[name="viewport"]`)
	if s.Length() == 0 {
		fmt.Println("找不到视口")
		return
	}
	fmt.Println(s.Eq(0).AttrOr("content", ""))
}

英文:

XPath does not seem suitable here; you should be using goquery, which is designed for HTML.

Here is an example:

package main

import (
	&quot;fmt&quot;
	
	&quot;github.com/PuerkitoBio/goquery&quot;
)

func main() {
	doc, err := goquery.NewDocument(&quot;https://example.com&quot;)
	if err != nil {
		panic(err)
	}
	s := doc.Find(`html &gt; head &gt; meta[name=&quot;viewport&quot;]`)
	if s.Length() == 0 {
		fmt.Println(&quot;could not find viewpoint&quot;)
		return
	}
	fmt.Println(s.Eq(0).AttrOr(&quot;content&quot;, &quot;&quot;))
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

GoLang – XmlPath Selectors with HTML

问题

答案1

答案2

为什么 Pion/WebRTC 生成的候选列表中没有 TCP 地址？

以管理员身份运行Go程序

go-automapper使用time.Time字段

GoLang 应用引擎结构名称

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论