有没有办法在使用html.Parse时不添加节点来创建一个“格式良好的树”?

huangapple go评论82阅读模式
英文:

Any way to use html.Parse without it adding nodes to make a 'well-formed tree'?

问题

package main

import (
"bytes"
"code.google.com/p/go.net/html"
"fmt"
"log"
"strings"
)

func main() {
s := "Blah. Blah. Blah."
n, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatalf("Parse error: %s", err)
}
var buf bytes.Buffer
if err := html.Render(&buf, n); err != nil {
log.Fatalf("Render error: %s", err)
}
fmt.Println(buf.String())
}

Output:

<html><head></head><body>Blah. <b>Blah.</b> Blah.</body></html>

有没有办法阻止html.Parse将片段转换为文档(即避免添加&lt;html&gt;&lt;body&gt;等)?我知道html.ParseFragment但它似乎表现出相同的行为。

您可以通过将要解析的文本包装在父元素(例如&lt;span&gt;)中,然后执行以下操作来解决此问题:

n = n.FirstChild.LastChild.FirstChild

但是这似乎有点笨拙。

理想情况下,我希望:接受输入,操作或删除其中找到的节点,并将结果写回字符串,即使结果是不完整的文档。

英文:
package main

import (
	&quot;bytes&quot;
	&quot;code.google.com/p/go.net/html&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;strings&quot;
)

func main() {
	s := &quot;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&quot;
	n, err := html.Parse(strings.NewReader(s))
	if err != nil {
		log.Fatalf(&quot;Parse error: %s&quot;, err)
	}
	var buf bytes.Buffer
	if err := html.Render(&amp;buf, n); err != nil {
		log.Fatalf(&quot;Render error: %s&quot;, err)
	}
	fmt.Println(buf.String())
}

Output:

&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body&gt;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&lt;/body&gt;&lt;/html&gt;

Is there a way to stop html.Parse from making a document out of fragments (ie avoid adding &lt;html&gt;, &lt;body&gt; etc.)? I'm aware of html.ParseFragment but it seems to exhibit the same behaviour.

You can get around it by wrapping the text to be parsed with a parent element such as &lt;span&gt; then doing something like the following:

n = n.FirstChild.LastChild.FirstChild

but that seems, well, kludgy to say the least.

Ideally I'd like to: accept input, manipulate or remove nodes found within it, and write the result back to a string, even if the result is an incomplete document.

答案1

得分: 13

您需要为ParseFragment提供上下文。以下程序打印出原始文本:

package main

import (
	"bytes"
	"code.google.com/p/go.net/html"
	"code.google.com/p/go.net/html/atom"
	"fmt"
	"log"
	"strings"
)

func main() {
	s := "Blah. <b>Blah.</b> Blah."
	n, err := html.ParseFragment(strings.NewReader(s), &html.Node{
		Type:     html.ElementNode,
		Data:     "body",
		DataAtom: atom.Body,
	})
	if err != nil {
		log.Fatalf("Parse error: %s", err)
	}
	var buf bytes.Buffer
	for _, node := range n {
		if err := html.Render(&buf, node); err != nil {
			log.Fatalf("Render error: %s", err)
		}
	}
	fmt.Println(buf.String())
}
英文:

You need to provide a context to ParseFragment. The following program prints out the original text:

package main

import (
	&quot;bytes&quot;
	&quot;code.google.com/p/go.net/html&quot;
	&quot;code.google.com/p/go.net/html/atom&quot;
	&quot;fmt&quot;
	&quot;log&quot;
	&quot;strings&quot;
)

func main() {
	s := &quot;Blah. &lt;b&gt;Blah.&lt;/b&gt; Blah.&quot;
	n, err := html.ParseFragment(strings.NewReader(s), &amp;html.Node{
		Type:     html.ElementNode,
		Data:     &quot;body&quot;,
		DataAtom: atom.Body,
	})
	if err != nil {
		log.Fatalf(&quot;Parse error: %s&quot;, err)
	}
	var buf bytes.Buffer
	for _, node := range n {
		if err := html.Render(&amp;buf, node); err != nil {
			log.Fatalf(&quot;Render error: %s&quot;, err)
		}
	}
	fmt.Println(buf.String())
}

答案2

得分: 6

你想要使用http://godoc.org/code.google.com/p/go.net/html#ParseFragment。将一个假的Body元素作为你的上下文传入,片段将作为你的片段中的元素的切片返回。

你可以在go-html-transform的go.net/html包的Partial*函数中看到一个例子。https://code.google.com/p/go-html-transform/source/browse/h5/h5.go#32

英文:

You want http://godoc.org/code.google.com/p/go.net/html#ParseFragment. Pass in a fake Body element as your context and the fragment will be returned as a slice of just the elements in your fragment.

You can see an example in the Partial* functions for go-html-transform's go.net/html wrapper package. https://code.google.com/p/go-html-transform/source/browse/h5/h5.go#32

huangapple
  • 本文由 发表于 2013年2月26日 12:01:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/15081119.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定