英文:
GoLang - XmlPath Selectors with HTML
问题
我正在查看这里的文档示例,但它只是在一个XML树上进行迭代,而不是HTML。因此,我还是有些困惑。
例如,如果我想通过名称在head标签中找到特定的meta标签,似乎是不行的?相反,我需要按照head标签中的顺序来找到它。在这种情况下,我想要第8个meta标签,我猜应该是:
headTag, err := getByID(xmlroot, "/head/meta[8]/")
但是,这里使用了一个用于标签名称的getByID函数 - 我不认为这会起作用。有哪些完整的"getBy..."命令列表?
然后,问题是,如何访问meta标签的内容?文档只提供了内部标签节点内容的示例。然而,这个示例是否有效?:
resp.Query = extractValue(headTag,
@content
)
@选择器让我感到困惑,这对这种情况是否合适?
换句话说:
- 是否有一个正确的HTML示例可用?
- 是否有正确的ID、标签等选择器列表?
- 可以通过名称找到标签,并从其内部内容标签中提取内容吗?
非常感谢!
英文:
I am looking at the documented example here, but it is iterating purely over an XML tree, and not HTML. Therefore, I am still partly confused.
For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? Instead, I need to find it by the order it is in the head tag. In this case, I want the 8th meta tag, which I assume is:
> headTag, err := getByID(xmlroot, "/head/meta[8]/")
But of course, this is using a getByID function for a tag name - which I don't believe will work. What is the full list of "getBy..." commands?
Then, the problem is, how do I access the meta tag's contents? The documentation only provides examples for the inner tag node content. However, will this example work?:
> resp.Query = extractValue(headTag, @content
)
The @ selector confuses me, is this appropriate for this case?
In other words:
- Is there a proper HTML example available?
- Is there a list of correct selectors for IDs, Tags, etc?
- Can Tags be found by name, and content extracted from its inner content tag?
Thank you very much!
答案1
得分: 7
我知道这个回答有点晚,但我仍然想推荐一个基于XPath表达式的简单而强大的*htmlquery*包。
以下是基于@Time-Cooper示例的代码。
package main
import (
"fmt"
"github.com/antchfx/htmlquery"
)
func main() {
doc, err := htmlquery.LoadURL("https://example.com")
if err != nil {
panic(err)
}
s := htmlquery.Find(doc, "//meta[@name='viewport']")
if len(s) == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(htmlquery.SelectAttr(s[0], "content"))
// alternative method,but simple more.
s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
fmt.Println(htmlquery.InnerText(s2))
}
英文:
I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*.
The below code based on @Time-Cooper example.
<!-- language: golang -->
package main
import (
"fmt"
"github.com/antchfx/htmlquery"
)
func main() {
doc, err := htmlquery.LoadURL("https://example.com")
if err != nil {
panic(err)
}
s := htmlquery.Find(doc, "//meta[@name='viewport']")
if len(s) == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(htmlquery.SelectAttr(s[0], "content"))
// alternative method,but simple more.
s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
fmt.Println(htmlquery.InnerText(s2))
}
答案2
得分: 5
XPath在这里似乎不太适用;你应该使用goquery,它专门用于处理HTML。
以下是一个示例:
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
)
func main() {
doc, err := goquery.NewDocument("https://example.com")
if err != nil {
panic(err)
}
s := doc.Find(`html > head > meta[name="viewport"]`)
if s.Length() == 0 {
fmt.Println("找不到视口")
return
}
fmt.Println(s.Eq(0).AttrOr("content", ""))
}
英文:
XPath does not seem suitable here; you should be using goquery, which is designed for HTML.
Here is an example:
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
)
func main() {
doc, err := goquery.NewDocument("https://example.com")
if err != nil {
panic(err)
}
s := doc.Find(`html > head > meta[name="viewport"]`)
if s.Length() == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(s.Eq(0).AttrOr("content", ""))
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论