2016年4月8日 22:27:06go评论120阅读模式

英文:

Convert xpath node back to html-markup in Go

问题

以下是翻译的内容：

import (
    "fmt"
    "gopkg.in/xmlpath.v2"
    "log"
)
...
path := xmlpath.MustCompile("//div[@id='23']")
tree, err := xmlpath.ParseHTML(reader)
if err != nil {
    log.Fatal("HTML解析错误，可能不是格式良好的HTML", err)
}
iter := path.Iter(tree)
for iter.Next() {
    fmt.Println(iter.Node().String()) // 仅返回文本节点的值
}
...
是否有办法将`iter.Node()`转换回像`<div>...</div>`这样的HTML标记？`iter.Node().String()`仅返回所有内部文本节点的值。据我所见，[xmlpath-package][1]的文档没有提供这样的功能。
[1]: https://godoc.org/gopkg.in/xmlpath.v2

英文:

import (
    &quot;fmt&quot;
    &quot;gopkg.in/xmlpath.v2&quot;
    &quot;log&quot;
)
...
path := xmlpath.MustCompile(&quot;//div[@id=&#39;23&#39;]&quot;)
tree, err := xmlpath.ParseHTML(reader)
if err != nil {
    log.Fatal(&quot;HTML parsing error, maybe not wellformed&quot;, err)
}
iter := path.Iter(tree)
for iter.Next() {
    fmt.Println(iter.Node().String()) // returns only the values of the text-node
}
...

Is there a way to convert iter.Node() back to html markup like <div>...</div>? iter.Node().String() returns only the values of all inner text nodes. As far as I see the documentation of the xmlpath-package does not offer such function.

答案1

得分: 0

你是对的 - gopkg.in/xmlpath.v2 函数只能读取节点的内容。在Go语言中，没有太多的替代方案来处理DOM。

从原生的Go库中，我只能提到 goquery。它只能处理HTML，不支持XPath，但支持CSS选择器。也许在你的情况下这已经足够了。

如果你确实需要通过XPath处理HTML和XML，可以使用Go的libxml封装库 gokogiri。它支持libxml的所有功能，因此你可以获取节点、内部/外部HTML、属性和其他内容。我曾在一个目前处于生产状态的服务中使用它来提取文本内容。它比PHP的DOMDocument要快一些。唯一的限制是我不确定它是否支持高于1.4.*版本的Go。哦，还有在Windows上安装有点棘手。

英文:

You are right - gopkg.in/xmlpath.v2 functions are limited to read content of nodes. And there is not many alternatives in Go to work with DOM.

From native Go libraries I can mention only goquery. It works only with HTML and does not support XPath but support CSS selectors. Maybe that would be enough in your case.

If you really need to work with both HTML and XML via XPath there is libxml wrapper for Go called gokogiri. It supports all features of libxml so you can get nodes, inner/outerHTML, attributes and other things. I used it to extract text content in one service which currently is in production state. It's a bit faster than PHP's DOMDocument. Only one limitation is fact that I'm not sure if it supports Go versions higher than 1.4.*. Oh and installation on Windows is a bit tricky.

答案2

得分: 0

我知道这个回答有点晚了，但我仍然推荐使用原生Go语言编写的这些包：xquery和xpath。它们支持使用XPath表达式从XML/HTML中提取数据或评估值。

英文:

I know this answer is to late, but still recommend these package written by native Go: xquery and xpath. it supports extract data or evaluate value from XML/HTML using XPath expression.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将xpath节点转换回Go中的HTML标记

问题

答案1

答案2

无效的GZIP头部

禁用通用名称验证 – Go HTTP 客户端

为什么在读取完io.Reader后它会变为空？

Apache Beam 在 Go 中从 PCollection 中选择前 N 行

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。