英文:
Golang Gokogiri recursive xpath anomaly
问题
我正在尝试在HTML文档上执行XPath操作。我想要进行一个两级XPath查询。HTML文档 "index.html" 如下所示:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="head">
<div class="area">
<div class="value">10</div>
</div>
<div class="area">
<div class="value">20</div>
</div>
<div class="area">
<div class="value">30</div>
</div>
</div>
</body>
</html>
我想要首先获取所有 class="area" 的 div,然后递归获取其中 class="value" 的 div,使用 Gokogiri 在 Golang 中实现。
我的Go代码如下:
package main
import (
"fmt"
"io/ioutil"
"github.com/moovweb/gokogiri"
"github.com/moovweb/gokogiri/xpath"
)
func main() {
content, _ := ioutil.ReadFile("index.html")
doc, _ := gokogiri.ParseHtml(content)
defer doc.Free()
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile("//div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
}
然而,我得到的输出结果很奇怪:
10
20
30
10
20
30
10
20
30
我本意是要得到:
10
20
30
我想要递归搜索XPath模式。我认为我的第二级XPath模式有问题。似乎我的第二级XPath又在整个文档中搜索,而不是在具有 class="area" 的各个 div 中搜索。我应该如何进行递归XPath模式搜索?我会非常感谢任何帮助。
英文:
I was trying to perform xpath operations on a html document. I wanted to do a two-level xpath query. The html document "index.html" is as follows:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<div class="head">
<div class="area">
<div class="value">10</div>
</div>
<div class="area">
<div class="value">20</div>
</div>
<div class="area">
<div class="value">30</div>
</div>
</div>
</body>
</html>
I wanted to get all divs with class="area" first, then recursively get divs inside it with class="value" in golang using Gokogiri.
My go code is as follows:
package main
import (
"fmt"
"io/ioutil"
"github.com/moovweb/gokogiri"
"github.com/moovweb/gokogiri/xpath"
)
func main() {
content, _ := ioutil.ReadFile("index.html")
doc, _ := gokogiri.ParseHtml(content)
defer doc.Free()
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile("//div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
}
However, the output I get is odd:
10
20
30
10
20
30
10
20
30
I intend to get:
10
20
30
I want to recursively search for xpath patterns. I think there is something wrong with my second level xpath pattern. It appears, my second level xpath is again search in the whole document instead of individual divs with class="area". What do I do for recursive xpath patterns search? I'd appreciate any help.
答案1
得分: 6
从任何节点开始的XPath搜索仍然可以搜索整个树。
如果你想只搜索子树,你可以以 .
开头(假设你仍然想要包含自身和后代节点),否则使用精确路径。
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile(".//div[@class='value']")
// 在你的示例中这样也可以工作
// xpw := xpath.Compile("div[@class='value']")
// 或者这样
// xpw := xpath.Compile("./div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
输出:
10
20
30
英文:
An XPath search from any node can still search the entire tree.
If you want to search just the subtree, you can start the expression with a .
(assuming you still want descendant-or-self), otherwise use a exact path.
xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile(".//div[@class='value']")
// this works in your example case
// xpw := xpath.Compile("div[@class='value']")
// as does this
// xpw := xpath.Compile("./div[@class='value']")
ss, _ := doc.Root().Search(xps)
for _, s := range ss {
ww, _ := s.Search(xpw)
for _, w := range ww {
fmt.Println(w.InnerHtml())
}
}
Prints:
10
20
30
答案2
得分: 2
你的第二个查询 //div[@class='value']
,将选择文档中任意位置的 div 元素,而不考虑父元素。相反,尝试使用 div[@class='value']
。
英文:
Your second query, //div[@class='value']
, will select divs anywhere in the document regardless of the parent element. Instead, try div[@class='value']
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论