2013年2月26日 12:01:28go评论88阅读模式

英文:

Any way to use html.Parse without it adding nodes to make a 'well-formed tree'?

问题

package main

import (
"bytes"
"code.google.com/p/go.net/html"
"fmt"
"log"
"strings"
)

func main() {
s := "Blah. Blah. Blah."
n, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatalf("Parse error: %s", err)
}
var buf bytes.Buffer
if err := html.Render(&buf, n); err != nil {
log.Fatalf("Render error: %s", err)
}
fmt.Println(buf.String())
}

Output:

有没有办法阻止html.Parse将片段转换为文档（即避免添加<html>，<body>等）？我知道html.ParseFragment但它似乎表现出相同的行为。

您可以通过将要解析的文本包装在父元素（例如<span>）中，然后执行以下操作来解决此问题：

n = n.FirstChild.LastChild.FirstChild

但是这似乎有点笨拙。

理想情况下，我希望：接受输入，操作或删除其中找到的节点，并将结果写回字符串，即使结果是不完整的文档。

英文:

package main

import (
	&quot;bytes&quot;
	&quot;code.google.com/p/go.net/html&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;strings&quot;
)

func main() {
	s := &quot;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&quot;
	n, err := html.Parse(strings.NewReader(s))
	if err != nil {
		log.Fatalf(&quot;Parse error: %s&quot;, err)
	}
	var buf bytes.Buffer
	if err := html.Render(&amp;buf, n); err != nil {
		log.Fatalf(&quot;Render error: %s&quot;, err)
	}
	fmt.Println(buf.String())
}

Output:

&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&lt;/body&gt;&lt;/html&gt;

Is there a way to stop html.Parse from making a document out of fragments (ie avoid adding <html>, <body> etc.)? I'm aware of html.ParseFragment but it seems to exhibit the same behaviour.

You can get around it by wrapping the text to be parsed with a parent element such as <span> then doing something like the following:

n = n.FirstChild.LastChild.FirstChild

but that seems, well, kludgy to say the least.

Ideally I'd like to: accept input, manipulate or remove nodes found within it, and write the result back to a string, even if the result is an incomplete document.

答案1

得分: 13

您需要为ParseFragment提供上下文。以下程序打印出原始文本：

package main

import (
	"bytes"
	"code.google.com/p/go.net/html"
	"code.google.com/p/go.net/html/atom"
	"fmt"
	"log"
	"strings"
)

func main() {
	s := "Blah. <b>Blah.</b> Blah."
	n, err := html.ParseFragment(strings.NewReader(s), &html.Node{
		Type:     html.ElementNode,
		Data:     "body",
		DataAtom: atom.Body,
	})
	if err != nil {
		log.Fatalf("Parse error: %s", err)
	}
	var buf bytes.Buffer
	for _, node := range n {
		if err := html.Render(&buf, node); err != nil {
			log.Fatalf("Render error: %s", err)
		}
	}
	fmt.Println(buf.String())
}

英文:

You need to provide a context to ParseFragment. The following program prints out the original text:

package main

import (
	&quot;bytes&quot;
	&quot;code.google.com/p/go.net/html&quot;
	&quot;code.google.com/p/go.net/html/atom&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;strings&quot;
)

func main() {
	s := &quot;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&quot;
	n, err := html.ParseFragment(strings.NewReader(s), &amp;html.Node{
		Type:     html.ElementNode,
		Data:     &quot;body&quot;,
		DataAtom: atom.Body,
	})
	if err != nil {
		log.Fatalf(&quot;Parse error: %s&quot;, err)
	}
	var buf bytes.Buffer
	for _, node := range n {
		if err := html.Render(&amp;buf, node); err != nil {
			log.Fatalf(&quot;Render error: %s&quot;, err)
		}
	}
	fmt.Println(buf.String())
}

答案2

得分: 6

你想要使用http://godoc.org/code.google.com/p/go.net/html#ParseFragment。将一个假的Body元素作为你的上下文传入，片段将作为你的片段中的元素的切片返回。

你可以在go-html-transform的go.net/html包的Partial*函数中看到一个例子。https://code.google.com/p/go-html-transform/source/browse/h5/h5.go#32

英文:

You want http://godoc.org/code.google.com/p/go.net/html#ParseFragment. Pass in a fake Body element as your context and the fragment will be returned as a slice of just the elements in your fragment.

You can see an example in the Partial* functions for go-html-transform's go.net/html wrapper package. https://code.google.com/p/go-html-transform/source/browse/h5/h5.go#32

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有办法在使用html.Parse时不添加节点来创建一个“格式良好的树”？

问题

答案1

答案2

如何使用NewDecode、Golang和req *http.Request解码和映射JSON对象

有没有类似于 getchar() 直到 EOF 的 Go 函数？

无法将 nil 转换为类型 x。

如何通过golang中的mgo将math/big.Int插入到MongoDB中？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论