2014年1月14日 23:21:27go评论127阅读模式

英文:

Go: How would you "Pretty Print"/"Prettify" HTML?

问题

在Python、PHP和许多其他语言中，可以将HTML文档转换为“漂亮”的格式。在Go语言中，可以很容易地使用MarshIndent函数将JSON和XML（从结构体/接口）转换为漂亮的格式。

以下是在Go语言中处理XML的示例代码：

package main
import (
	"encoding/xml"
	"fmt"
	"os"
)
func main() {
	type Address struct {
		City, State string
	}
	type Person struct {
		XMLName   xml.Name `xml:"person"`
		Id        int      `xml:"id,attr"`
		FirstName string   `xml:"name>first"`
		LastName  string   `xml:"name>last"`
		Age       int      `xml:"age"`
		Height    float32  `xml:"height,omitempty"`
		Married   bool
		Address
		Comment string `xml:",comment"`
	}
	v := &Person{Id: 13, FirstName: "John", LastName: "Doe", Age: 42}
	v.Comment = " Need more details. "
	v.Address = Address{"Hanga Roa", "Easter Island"}
	output, err := xml.MarshalIndent(v, "  ", "    ")
	if err != nil {
		fmt.Printf("error: %v\n", err)
	}
	os.Stdout.Write(output)
}

然而，这种方法只适用于将结构体/接口转换为[]byte。我想要的是将HTML代码字符串自动缩进的功能。以下是示例：

原始HTML代码：

<!doctype html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>

缩进后的HTML代码：

<!doctype html>
<html>
	<head>
		<title>Website Title</title>
	</head>
	<body>
		<div class="random-class">
			<h1>I like pie</h1>
			<p>It's true!</p>
		</div>
	</body>
</html>

如何使用字符串实现这个功能呢？

英文:

In Python, PHP, and many other languages, it is possible to convert a html document and "prettify" it. In Go, this is very easily done for JSON and XML (from a struct/interface) using the MarshIndent function.

Example for XML in Go:

http://play.golang.org/p/aBNfNxTEG1

package main
import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
	&quot;os&quot;
)
func main() {
	type Address struct {
		City, State string
	}
	type Person struct {
		XMLName   xml.Name `xml:&quot;person&quot;`
		Id        int      `xml:&quot;id,attr&quot;`
		FirstName string   `xml:&quot;name&gt;first&quot;`
		LastName  string   `xml:&quot;name&gt;last&quot;`
		Age       int      `xml:&quot;age&quot;`
		Height    float32  `xml:&quot;height,omitempty&quot;`
		Married   bool
		Address
		Comment string `xml:&quot;,comment&quot;`
	}
	v := &amp;Person{Id: 13, FirstName: &quot;John&quot;, LastName: &quot;Doe&quot;, Age: 42}
	v.Comment = &quot; Need more details. &quot;
	v.Address = Address{&quot;Hanga Roa&quot;, &quot;Easter Island&quot;}
	output, err := xml.MarshalIndent(v, &quot;  &quot;, &quot;    &quot;)
	if err != nil {
		fmt.Printf(&quot;error: %v\n&quot;, err)
	}
	os.Stdout.Write(output)
}

However, this only works for converting struct/interface into a []byte. What I want is convert a string of html code and indent automatically. Example:

Raw HTML

&lt;!doctype html&gt;&lt;html&gt;&lt;head&gt;
&lt;title&gt;Website Title&lt;/title&gt;
&lt;/head&gt;&lt;body&gt;
&lt;div class=&quot;random-class&quot;&gt;
&lt;h1&gt;I like pie&lt;/h1&gt;&lt;p&gt;It&#39;s true!&lt;/p&gt;&lt;/div&gt;
&lt;/body&gt;&lt;/html&gt;

Prettified HTML

&lt;!doctype html&gt;
&lt;html&gt;
	&lt;head&gt;
		&lt;title&gt;Website Title&lt;/title&gt;
	&lt;/head&gt;
	&lt;body&gt;
		&lt;div class=&quot;random-class&quot;&gt;
			&lt;h1&gt;I like pie&lt;/h1&gt;
			&lt;p&gt;It&#39;s true!&lt;/p&gt;
		&lt;/div&gt;
	&lt;/body&gt;
&lt;/html&gt;

How would this be done just using a string?

答案1

得分: 16

我遇到了同样的问题，我通过自己在Go中创建一个HTML格式化包来解决了它。

这是它的链接：

GoHTML - Go的HTML格式化程序

请查看这个包。

谢谢，

Keiji

英文:

I faced a same problem and I just solved it by creating an HTML formatting package in Go by myself.

Here it is:

GoHTML - HTML formatter for Go

Please check this package out.

Thanks,

Keiji

答案2

得分: 8

我在尝试找出如何在Go中漂亮地打印XML时找到了这个问题。由于我在任何地方都没有找到答案，所以这是我的解决方案：

import (
	"bytes"
	"encoding/xml"
	"io"
)
func formatXML(data []byte) ([]byte, error) {
	b := &bytes.Buffer{}
	decoder := xml.NewDecoder(bytes.NewReader(data))
	encoder := xml.NewEncoder(b)
	encoder.Indent("", "  ")
	for {
		token, err := decoder.Token()
		if err == io.EOF {
			encoder.Flush()
			return b.Bytes(), nil
		}
		if err != nil {
			return nil, err
		}
		err = encoder.EncodeToken(token)
		if err != nil {
			return nil, err
		}
	}
}

这段代码可以将XML数据进行漂亮的格式化打印。

英文:

I found this question when trying to figure out how to pretty print xml in Go. Since I didn't find the answer anywhere, here's my solution:

import (
	&quot;bytes&quot;
	&quot;encoding/xml&quot;
	&quot;io&quot;
)
func formatXML(data []byte) ([]byte, error) {
	b := &amp;bytes.Buffer{}
	decoder := xml.NewDecoder(bytes.NewReader(data))
	encoder := xml.NewEncoder(b)
	encoder.Indent(&quot;&quot;, &quot;  &quot;)
	for {
		token, err := decoder.Token()
		if err == io.EOF {
			encoder.Flush()
			return b.Bytes(), nil
		}
		if err != nil {
			return nil, err
		}
		err = encoder.EncodeToken(token)
		if err != nil {
			return nil, err
		}
	}
}

答案3

得分: 5

找到了一种使用XML解析器的好方法：

package main
import (
	"encoding/xml"
	"fmt"
)
func main() {
	html := "<html><head><title>Website Title</title></head><body><div class=\"random-class\"><h1>I like pie</h1><p>It's true!</p></div></body></html>"
	type node struct {
		Attr     []xml.Attr
		XMLName  xml.Name
		Children []node `xml:",any"`
		Text     string `xml:",chardata"`
	}
	x := node{}
	_ = xml.Unmarshal([]byte(html), &x)
	buf, _ := xml.MarshalIndent(x, "", "\t")
	fmt.Println(string(buf))
}

将输出以下内容：

<html>
	<head>
		<title>Website Title</title>
	</head>
	<body>
		<div>
			<h1>I like pie</h1>
			<p>It&#39;s true!</p>
		</div>
	</body>
</html>

英文:

EDIT: Found a great way using the XML parser:

package main
import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
)
func main() {
	html := &quot;&lt;html&gt;&lt;head&gt;&lt;title&gt;Website Title&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;div class=\&quot;random-class\&quot;&gt;&lt;h1&gt;I like pie&lt;/h1&gt;&lt;p&gt;It&#39;s true!&lt;/p&gt;&lt;/div&gt;&lt;/body&gt;&lt;/html&gt;&quot;
	type node struct {
		Attr     []xml.Attr
		XMLName  xml.Name
		Children []node `xml:&quot;,any&quot;`
		Text     string `xml:&quot;,chardata&quot;`
	}
	x := node{}
	_ = xml.Unmarshal([]byte(html), &amp;x)
	buf, _ := xml.MarshalIndent(x, &quot;&quot;, &quot;\t&quot;)
	fmt.Println(string(buf))
}

will output the following:

&lt;html&gt;
	&lt;head&gt;
		&lt;title&gt;Website Title&lt;/title&gt;
	&lt;/head&gt;
	&lt;body&gt;
		&lt;div&gt;
			&lt;h1&gt;I like pie&lt;/h1&gt;
			&lt;p&gt;It&amp;#39;s true!&lt;/p&gt;
		&lt;/div&gt;
	&lt;/body&gt;
&lt;/html&gt;

答案4

得分: 2

你可以使用code.google.com/p/go.net/html来解析HTML，并编写自己版本的Render函数，该函数可以跟踪缩进。

但是我要提醒你要小心在HTML中添加和删除空格。尽管空格通常不重要，但如果不小心处理，渲染的文本中可能会出现空格的出现和消失。

以下是我最近编写的一个漂亮打印函数。它处理了一些特殊情况，但并不是全部。

func prettyPrint(b *bytes.Buffer, n *html.Node, depth int) {
	switch n.Type {
	case html.DocumentNode:
		for c := n.FirstChild; c != nil; c = c.NextSibling {
			prettyPrint(b, c, depth)
		}
	case html.ElementNode:
		justRender := false
		switch {
		case n.FirstChild == nil:
			justRender = true
		case n.Data == "pre" || n.Data == "textarea":
			justRender = true
		case n.Data == "script" || n.Data == "style":
			break
		case n.FirstChild == n.LastChild && n.FirstChild.Type == html.TextNode:
			if !isInline(n) {
				c := n.FirstChild
				c.Data = strings.Trim(c.Data, " \t\n\r")
			}
			justRender = true
		case isInline(n) && contentIsInline(n):
			justRender = true
		}
		if justRender {
			indent(b, depth)
			html.Render(b, n)
			b.WriteByte('\n')
			return
		}
		indent(b, depth)
		fmt.Fprintln(b, html.Token{
			Type: html.StartTagToken,
			Data: n.Data,
			Attr: n.Attr,
		})
		for c := n.FirstChild; c != nil; c = c.NextSibling {
			if n.Data == "script" || n.Data == "style" && c.Type == html.TextNode {
				prettyPrintScript(b, c.Data, depth+1)
			} else {
				prettyPrint(b, c, depth+1)
			}
		}
		indent(b, depth)
		fmt.Fprintln(b, html.Token{
			Type: html.EndTagToken,
			Data: n.Data,
		})
	case html.TextNode:
		n.Data = strings.Trim(n.Data, " \t\n\r")
		if n.Data == "" {
			return
		}
		indent(b, depth)
		html.Render(b, n)
		b.WriteByte('\n')
	default:
		indent(b, depth)
		html.Render(b, n)
		b.WriteByte('\n')
	}
}
func isInline(n *html.Node) bool {
	switch n.Type {
	case html.TextNode, html.CommentNode:
		return true
	case html.ElementNode:
		switch n.Data {
		case "b", "big", "i", "small", "tt", "abbr", "acronym", "cite", "dfn", "em", "kbd", "strong", "samp", "var", "a", "bdo", "img", "map", "object", "q", "span", "sub", "sup", "button", "input", "label", "select", "textarea":
			return true
		default:
			return false
		}
	default:
		return false
	}
}
func contentIsInline(n *html.Node) bool {
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		if !isInline(c) || !contentIsInline(c) {
			return false
		}
	}
	return true
}
func indent(b *bytes.Buffer, depth int) {
	depth *= 2
	for i := 0; i < depth; i++ {
		b.WriteByte(' ')
	}
}
func prettyPrintScript(b *bytes.Buffer, s string, depth int) {
	for _, line := range strings.Split(s, "\n") {
		line = strings.TrimSpace(line)
		if line == "" {
			continue
		}
		depthChange := 0
		for _, c := range line {
			switch c {
			case '(', '[', '{':
				depthChange++
			case ')', ']', '}':
				depthChange--
			}
		}
		switch line[0] {
		case '.':
			indent(b, depth+1)
		case ')', ']', '}':
			indent(b, depth-1)
		default:
			indent(b, depth)
		}
		depth += depthChange
		fmt.Fprintln(b, line)
	}
}

希望对你有帮助！

英文:

You could parse the HTML with code.google.com/p/go.net/html, and write your own version of the Render function from that package—one that keeps track of indentation.

But let me warn you: you need to be careful with adding and removing whitespace in HTML. Although whitespace is not usually significant, you can have spaces appearing and disappearing in the rendered text if you're not careful.

Edit:

Here's a pretty-printer function I wrote recently. It handles some of the special cases, but not all of them.

func prettyPrint(b *bytes.Buffer, n *html.Node, depth int) {
switch n.Type {
case html.DocumentNode:
for c := n.FirstChild; c != nil; c = c.NextSibling {
prettyPrint(b, c, depth)
}
case html.ElementNode:
justRender := false
switch {
case n.FirstChild == nil:
justRender = true
case n.Data == &quot;pre&quot; || n.Data == &quot;textarea&quot;:
justRender = true
case n.Data == &quot;script&quot; || n.Data == &quot;style&quot;:
break
case n.FirstChild == n.LastChild &amp;&amp; n.FirstChild.Type == html.TextNode:
if !isInline(n) {
c := n.FirstChild
c.Data = strings.Trim(c.Data, &quot; \t\n\r&quot;)
}
justRender = true
case isInline(n) &amp;&amp; contentIsInline(n):
justRender = true
}
if justRender {
indent(b, depth)
html.Render(b, n)
b.WriteByte(&#39;\n&#39;)
return
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.StartTagToken,
Data: n.Data,
Attr: n.Attr,
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
if n.Data == &quot;script&quot; || n.Data == &quot;style&quot; &amp;&amp; c.Type == html.TextNode {
prettyPrintScript(b, c.Data, depth+1)
} else {
prettyPrint(b, c, depth+1)
}
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.EndTagToken,
Data: n.Data,
})
case html.TextNode:
n.Data = strings.Trim(n.Data, &quot; \t\n\r&quot;)
if n.Data == &quot;&quot; {
return
}
indent(b, depth)
html.Render(b, n)
b.WriteByte(&#39;\n&#39;)
default:
indent(b, depth)
html.Render(b, n)
b.WriteByte(&#39;\n&#39;)
}
}
func isInline(n *html.Node) bool {
switch n.Type {
case html.TextNode, html.CommentNode:
return true
case html.ElementNode:
switch n.Data {
case &quot;b&quot;, &quot;big&quot;, &quot;i&quot;, &quot;small&quot;, &quot;tt&quot;, &quot;abbr&quot;, &quot;acronym&quot;, &quot;cite&quot;, &quot;dfn&quot;, &quot;em&quot;, &quot;kbd&quot;, &quot;strong&quot;, &quot;samp&quot;, &quot;var&quot;, &quot;a&quot;, &quot;bdo&quot;, &quot;img&quot;, &quot;map&quot;, &quot;object&quot;, &quot;q&quot;, &quot;span&quot;, &quot;sub&quot;, &quot;sup&quot;, &quot;button&quot;, &quot;input&quot;, &quot;label&quot;, &quot;select&quot;, &quot;textarea&quot;:
return true
default:
return false
}
default:
return false
}
}
func contentIsInline(n *html.Node) bool {
for c := n.FirstChild; c != nil; c = c.NextSibling {
if !isInline(c) || !contentIsInline(c) {
return false
}
}
return true
}
func indent(b *bytes.Buffer, depth int) {
depth *= 2
for i := 0; i &lt; depth; i++ {
b.WriteByte(&#39; &#39;)
}
}
func prettyPrintScript(b *bytes.Buffer, s string, depth int) {
for _, line := range strings.Split(s, &quot;\n&quot;) {
line = strings.TrimSpace(line)
if line == &quot;&quot; {
continue
}
depthChange := 0
for _, c := range line {
switch c {
case &#39;(&#39;, &#39;[&#39;, &#39;{&#39;:
depthChange++
case &#39;)&#39;, &#39;]&#39;, &#39;}&#39;:
depthChange--
}
}
switch line[0] {
case &#39;.&#39;:
indent(b, depth+1)
case &#39;)&#39;, &#39;]&#39;, &#39;}&#39;:
indent(b, depth-1)
default:
indent(b, depth)
}
depth += depthChange
fmt.Fprintln(b, line)
}
}

答案5

得分: 2

简短回答

使用这个Go的HTML漂亮打印库（我写的，咳咳）。它有一些测试用例，适用于基本输入，并且希望随着时间的推移变得更加健壮，尽管现在它并不是非常健壮。请注意自述文件中的已知问题部分。

长篇回答

对于简单情况，使用code.google.com/p/go.net/html包（上述包就是使用这个包实现的）可以相对容易地自己编写HTML漂亮化程序。以下是使用这种方式实现的一个非常简单的Prettify函数：

func Prettify(raw string, indent string) (pretty string, e error) {
    r := strings.NewReader(raw)
    z := html.NewTokenizer(r)
    pretty = ""
    depth := 0
    prevToken := html.CommentToken
    for {
        tt := z.Next()
        tokenString := string(z.Raw())
        // 去除换行符
        if tt == html.TextToken {
            stripped := strings.Trim(tokenString, "\n")
            if len(stripped) == 0 {
                continue
            }
        }
        if tt == html.EndTagToken {
            depth -= 1
        }
        if tt != html.TextToken {
            if prevToken != html.TextToken {
                pretty += "\n"
                for i := 0; i < depth; i++ {
                    pretty += indent
                }
            }
        }
        pretty += tokenString
        // 最后一个标记
        if tt == html.ErrorToken {
            break
        } else if tt == html.StartTagToken {
            depth += 1
        }
        prevToken = tt
    }
    return strings.Trim(pretty, "\n"), nil
}

它可以处理像你提供的示例这样的简单情况。例如，

html := `<!DOCTYPE html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>`
pretty, _ := Prettify(html, "    ")
fmt.Println(pretty)

将输出以下内容：

<!DOCTYPE html>
<html>
    <head>
        <title>Website Title</title>
    </head>
    <body>
        <div class="random-class">
            <h1>I like pie</h1>
            <p>It's true!</p>
        </div>
    </body>
</html>

请注意，这种简单方法尚未处理HTML注释，也不能完美处理不符合XHTML标准的HTML5自闭合标签（如<br>），在应该保留空白字符时不能保证保留空白字符，以及其他一系列我尚未考虑到的边缘情况。只将其用作参考、玩具或起点

英文:

Short answer

Use this HTML prettyprint library for Go (that I wrote, *uhum*). It has some tests and works for basic inputs, and will hopefully become more robust over time, though it isn't very robust right now. Note the Known Issues section in the readme.

Long Answer

Rolling your own HTML prettifier for simple cases is reasonably easy using the code.google.com/p/go.net/html package (that's what the above package does). Here is a very simple Prettify function implemented in this way:

func Prettify(raw string, indent string) (pretty string, e error) {
r := strings.NewReader(raw)
z := html.NewTokenizer(r)
pretty = &quot;&quot;
depth := 0
prevToken := html.CommentToken
for {
tt := z.Next()
tokenString := string(z.Raw())
// strip away newlines
if tt == html.TextToken {
stripped := strings.Trim(tokenString, &quot;\n&quot;)
if len(stripped) == 0 {
continue
}
}
if tt == html.EndTagToken {
depth -= 1
}
if tt != html.TextToken {
if prevToken != html.TextToken {
pretty += &quot;\n&quot;
for i := 0; i &lt; depth; i++ {
pretty += indent
}
}
}
pretty += tokenString
// last token
if tt == html.ErrorToken {
break
} else if tt == html.StartTagToken {
depth += 1
}
prevToken = tt
}
return strings.Trim(pretty, &quot;\n&quot;), nil
}

It handles simple examples, like the one you provided. For example,

html := `&lt;!DOCTYPE html&gt;&lt;html&gt;&lt;head&gt;
&lt;title&gt;Website Title&lt;/title&gt;
&lt;/head&gt;&lt;body&gt;
&lt;div class=&quot;random-class&quot;&gt;
&lt;h1&gt;I like pie&lt;/h1&gt;&lt;p&gt;It&#39;s true!&lt;/p&gt;&lt;/div&gt;
&lt;/body&gt;&lt;/html&gt;`
pretty, _ := Prettify(html, &quot;    &quot;)
fmt.Println(pretty)

will print the following:

&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;title&gt;Website Title&lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
&lt;div class=&quot;random-class&quot;&gt;
&lt;h1&gt;I like pie&lt;/h1&gt;
&lt;p&gt;It&#39;s true!&lt;/p&gt;
&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt;

Beware though, this simple approach doesn't yet handle HTML comments, nor does it handle perfectly valid self-closing HTML5 tags that are not XHTML-compliant, like <br>, whitespace is not guaranteed to be preserved when it should, and a whole range of other edge cases I haven't yet thought of. Use it only as a reference, a toy or a starting point

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go: How would you "Pretty Print"/"Prettify" HTML?

问题

答案1

答案2

答案3

答案4

答案5

简短回答

长篇回答

Short answer

Long Answer

在空结构体上定义方法接收器是否是一种不好的做法？

当编译一组函数时出现错误。

SIGCONT 未被通道检测到。

regular expression to match exact word with boundries

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。