2014年8月19日 23:01:12go评论88阅读模式

英文:

Golang Gokogiri recursive xpath anomaly

问题

我正在尝试在HTML文档上执行XPath操作。我想要进行一个两级XPath查询。HTML文档 "index.html" 如下所示：

<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<title>Document</title>
</head>
<body>
	<div class="head">
		<div class="area">
			<div class="value">10</div>
		</div>
		<div class="area">
			<div class="value">20</div>
		</div>
		<div class="area">
			<div class="value">30</div>
		</div>
	</div>
</body>
</html>

我想要首先获取所有 class="area" 的 div，然后递归获取其中 class="value" 的 div，使用 Gokogiri 在 Golang 中实现。

我的Go代码如下：
package main

import (
	"fmt"
	"io/ioutil"

	"github.com/moovweb/gokogiri"
	"github.com/moovweb/gokogiri/xpath"
)

func main() {
	content, _ := ioutil.ReadFile("index.html")

	doc, _ := gokogiri.ParseHtml(content)
	defer doc.Free()

	xps := xpath.Compile("//div[@class='head']/div[@class='area']")
	xpw := xpath.Compile("//div[@class='value']")
	ss, _ := doc.Root().Search(xps)
	for _, s := range ss {
		ww, _ := s.Search(xpw)
		for _, w := range ww {
			fmt.Println(w.InnerHtml())
		}
	}
}

然而，我得到的输出结果很奇怪：

我本意是要得到：

10
20
30

我想要递归搜索XPath模式。我认为我的第二级XPath模式有问题。似乎我的第二级XPath又在整个文档中搜索，而不是在具有 class="area" 的各个 div 中搜索。我应该如何进行递归XPath模式搜索？我会非常感谢任何帮助。

英文:

I was trying to perform xpath operations on a html document. I wanted to do a two-level xpath query. The html document "index.html" is as follows:

&lt;!DOCTYPE html&gt;
&lt;html lang=&quot;en&quot;&gt;
&lt;head&gt;
	&lt;meta charset=&quot;UTF-8&quot;&gt;
	&lt;title&gt;Document&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
	&lt;div class=&quot;head&quot;&gt;
		&lt;div class=&quot;area&quot;&gt;
			&lt;div class=&quot;value&quot;&gt;10&lt;/div&gt;
		&lt;/div&gt;
		&lt;div class=&quot;area&quot;&gt;
			&lt;div class=&quot;value&quot;&gt;20&lt;/div&gt;
		&lt;/div&gt;
		&lt;div class=&quot;area&quot;&gt;
			&lt;div class=&quot;value&quot;&gt;30&lt;/div&gt;
		&lt;/div&gt;
	&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;

I wanted to get all divs with class="area" first, then recursively get divs inside it with class="value" in golang using Gokogiri.

My go code is as follows:
package main

import (
	&quot;fmt&quot;
	&quot;io/ioutil&quot;

	&quot;github.com/moovweb/gokogiri&quot;
	&quot;github.com/moovweb/gokogiri/xpath&quot;
)

func main() {
	content, _ := ioutil.ReadFile(&quot;index.html&quot;)

	doc, _ := gokogiri.ParseHtml(content)
	defer doc.Free()

	xps := xpath.Compile(&quot;//div[@class=&#39;head&#39;]/div[@class=&#39;area&#39;]&quot;)
	xpw := xpath.Compile(&quot;//div[@class=&#39;value&#39;]&quot;)
	ss, _ := doc.Root().Search(xps)
	for _, s := range ss {
		ww, _ := s.Search(xpw)
		for _, w := range ww {
			fmt.Println(w.InnerHtml())
		}
	}
}

However, the output I get is odd:

I intend to get:

10
20
30

I want to recursively search for xpath patterns. I think there is something wrong with my second level xpath pattern. It appears, my second level xpath is again search in the whole document instead of individual divs with class="area". What do I do for recursive xpath patterns search? I'd appreciate any help.

答案1

得分: 6

从任何节点开始的XPath搜索仍然可以搜索整个树。

如果你想只搜索子树，你可以以 . 开头（假设你仍然想要包含自身和后代节点），否则使用精确路径。

xps := xpath.Compile("//div[@class='head']/div[@class='area']")
xpw := xpath.Compile(".//div[@class='value']")

// 在你的示例中这样也可以工作
// xpw := xpath.Compile("div[@class='value']")
// 或者这样
// xpw := xpath.Compile("./div[@class='value']")

ss, _ := doc.Root().Search(xps)
for _, s := range ss {
    ww, _ := s.Search(xpw)
    for _, w := range ww {
        fmt.Println(w.InnerHtml())
    }
}

输出：

10
20
30

英文:

An XPath search from any node can still search the entire tree.

If you want to search just the subtree, you can start the expression with a . (assuming you still want descendant-or-self), otherwise use a exact path.

xps := xpath.Compile(&quot;//div[@class=&#39;head&#39;]/div[@class=&#39;area&#39;]&quot;)
xpw := xpath.Compile(&quot;.//div[@class=&#39;value&#39;]&quot;)

// this works in your example case
// xpw := xpath.Compile(&quot;div[@class=&#39;value&#39;]&quot;)
// as does this
// xpw := xpath.Compile(&quot;./div[@class=&#39;value&#39;]&quot;)

ss, _ := doc.Root().Search(xps)
for _, s := range ss {
    ww, _ := s.Search(xpw)
    for _, w := range ww {
        fmt.Println(w.InnerHtml())
    }
}

Prints:

10
20
30

答案2

得分: 2

你的第二个查询 //div[@class='value']，将选择文档中任意位置的 div 元素，而不考虑父元素。相反，尝试使用 div[@class='value']。

英文:

Your second query, //div[@class='value'], will select divs anywhere in the document regardless of the parent element. Instead, try div[@class='value'].

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang Gokogiri 递归 xpath 异常

问题

答案1

答案2

Go语言的精确垃圾回收（GC）是如何工作的？

如何识别 goroutine 的堆栈大小？

我正在将一个带有指向值的指针的 C 结构体传递给一个 Go 函数。

GEOADD command with Redigo

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论