英文:
Go: How would you "Pretty Print"/"Prettify" HTML?
问题
在Python、PHP和许多其他语言中,可以将HTML文档转换为“漂亮”的格式。在Go语言中,可以很容易地使用MarshIndent函数将JSON和XML(从结构体/接口)转换为漂亮的格式。
以下是在Go语言中处理XML的示例代码:
package main
import (
"encoding/xml"
"fmt"
"os"
)
func main() {
type Address struct {
City, State string
}
type Person struct {
XMLName xml.Name `xml:"person"`
Id int `xml:"id,attr"`
FirstName string `xml:"name>first"`
LastName string `xml:"name>last"`
Age int `xml:"age"`
Height float32 `xml:"height,omitempty"`
Married bool
Address
Comment string `xml:",comment"`
}
v := &Person{Id: 13, FirstName: "John", LastName: "Doe", Age: 42}
v.Comment = " Need more details. "
v.Address = Address{"Hanga Roa", "Easter Island"}
output, err := xml.MarshalIndent(v, " ", " ")
if err != nil {
fmt.Printf("error: %v\n", err)
}
os.Stdout.Write(output)
}
然而,这种方法只适用于将结构体/接口转换为[]byte。我想要的是将HTML代码字符串自动缩进的功能。以下是示例:
原始HTML代码:
<!doctype html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>
缩进后的HTML代码:
<!doctype html>
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div class="random-class">
<h1>I like pie</h1>
<p>It's true!</p>
</div>
</body>
</html>
如何使用字符串实现这个功能呢?
英文:
In Python, PHP, and many other languages, it is possible to convert a html document and "prettify" it. In Go, this is very easily done for JSON and XML (from a struct/interface) using the MarshIndent function.
Example for XML in Go:
http://play.golang.org/p/aBNfNxTEG1
package main
import (
"encoding/xml"
"fmt"
"os"
)
func main() {
type Address struct {
City, State string
}
type Person struct {
XMLName xml.Name `xml:"person"`
Id int `xml:"id,attr"`
FirstName string `xml:"name>first"`
LastName string `xml:"name>last"`
Age int `xml:"age"`
Height float32 `xml:"height,omitempty"`
Married bool
Address
Comment string `xml:",comment"`
}
v := &Person{Id: 13, FirstName: "John", LastName: "Doe", Age: 42}
v.Comment = " Need more details. "
v.Address = Address{"Hanga Roa", "Easter Island"}
output, err := xml.MarshalIndent(v, " ", " ")
if err != nil {
fmt.Printf("error: %v\n", err)
}
os.Stdout.Write(output)
}
However, this only works for converting struct/interface into a []byte. What I want is convert a string of html code and indent automatically. Example:
Raw HTML
<!doctype html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>
Prettified HTML
<!doctype html>
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div class="random-class">
<h1>I like pie</h1>
<p>It's true!</p>
</div>
</body>
</html>
How would this be done just using a string?
答案1
得分: 16
我遇到了同样的问题,我通过自己在Go中创建一个HTML格式化包来解决了它。
这是它的链接:
请查看这个包。
谢谢,
Keiji
英文:
I faced a same problem and I just solved it by creating an HTML formatting package in Go by myself.
Here it is:
GoHTML - HTML formatter for Go
Please check this package out.
Thanks,
Keiji
答案2
得分: 8
我在尝试找出如何在Go中漂亮地打印XML时找到了这个问题。由于我在任何地方都没有找到答案,所以这是我的解决方案:
import (
"bytes"
"encoding/xml"
"io"
)
func formatXML(data []byte) ([]byte, error) {
b := &bytes.Buffer{}
decoder := xml.NewDecoder(bytes.NewReader(data))
encoder := xml.NewEncoder(b)
encoder.Indent("", " ")
for {
token, err := decoder.Token()
if err == io.EOF {
encoder.Flush()
return b.Bytes(), nil
}
if err != nil {
return nil, err
}
err = encoder.EncodeToken(token)
if err != nil {
return nil, err
}
}
}
这段代码可以将XML数据进行漂亮的格式化打印。
英文:
I found this question when trying to figure out how to pretty print xml in Go. Since I didn't find the answer anywhere, here's my solution:
import (
"bytes"
"encoding/xml"
"io"
)
func formatXML(data []byte) ([]byte, error) {
b := &bytes.Buffer{}
decoder := xml.NewDecoder(bytes.NewReader(data))
encoder := xml.NewEncoder(b)
encoder.Indent("", " ")
for {
token, err := decoder.Token()
if err == io.EOF {
encoder.Flush()
return b.Bytes(), nil
}
if err != nil {
return nil, err
}
err = encoder.EncodeToken(token)
if err != nil {
return nil, err
}
}
}
答案3
得分: 5
找到了一种使用XML解析器的好方法:
package main
import (
"encoding/xml"
"fmt"
)
func main() {
html := "<html><head><title>Website Title</title></head><body><div class=\"random-class\"><h1>I like pie</h1><p>It's true!</p></div></body></html>"
type node struct {
Attr []xml.Attr
XMLName xml.Name
Children []node `xml:",any"`
Text string `xml:",chardata"`
}
x := node{}
_ = xml.Unmarshal([]byte(html), &x)
buf, _ := xml.MarshalIndent(x, "", "\t")
fmt.Println(string(buf))
}
将输出以下内容:
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div>
<h1>I like pie</h1>
<p>It's true!</p>
</div>
</body>
</html>
英文:
EDIT: Found a great way using the XML parser:
package main
import (
"encoding/xml"
"fmt"
)
func main() {
html := "<html><head><title>Website Title</title></head><body><div class=\"random-class\"><h1>I like pie</h1><p>It's true!</p></div></body></html>"
type node struct {
Attr []xml.Attr
XMLName xml.Name
Children []node `xml:",any"`
Text string `xml:",chardata"`
}
x := node{}
_ = xml.Unmarshal([]byte(html), &x)
buf, _ := xml.MarshalIndent(x, "", "\t")
fmt.Println(string(buf))
}
will output the following:
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div>
<h1>I like pie</h1>
<p>It&#39;s true!</p>
</div>
</body>
</html>
答案4
得分: 2
你可以使用code.google.com/p/go.net/html来解析HTML,并编写自己版本的Render函数,该函数可以跟踪缩进。
但是我要提醒你要小心在HTML中添加和删除空格。尽管空格通常不重要,但如果不小心处理,渲染的文本中可能会出现空格的出现和消失。
以下是我最近编写的一个漂亮打印函数。它处理了一些特殊情况,但并不是全部。
func prettyPrint(b *bytes.Buffer, n *html.Node, depth int) {
switch n.Type {
case html.DocumentNode:
for c := n.FirstChild; c != nil; c = c.NextSibling {
prettyPrint(b, c, depth)
}
case html.ElementNode:
justRender := false
switch {
case n.FirstChild == nil:
justRender = true
case n.Data == "pre" || n.Data == "textarea":
justRender = true
case n.Data == "script" || n.Data == "style":
break
case n.FirstChild == n.LastChild && n.FirstChild.Type == html.TextNode:
if !isInline(n) {
c := n.FirstChild
c.Data = strings.Trim(c.Data, " \t\n\r")
}
justRender = true
case isInline(n) && contentIsInline(n):
justRender = true
}
if justRender {
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
return
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.StartTagToken,
Data: n.Data,
Attr: n.Attr,
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
if n.Data == "script" || n.Data == "style" && c.Type == html.TextNode {
prettyPrintScript(b, c.Data, depth+1)
} else {
prettyPrint(b, c, depth+1)
}
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.EndTagToken,
Data: n.Data,
})
case html.TextNode:
n.Data = strings.Trim(n.Data, " \t\n\r")
if n.Data == "" {
return
}
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
default:
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
}
}
func isInline(n *html.Node) bool {
switch n.Type {
case html.TextNode, html.CommentNode:
return true
case html.ElementNode:
switch n.Data {
case "b", "big", "i", "small", "tt", "abbr", "acronym", "cite", "dfn", "em", "kbd", "strong", "samp", "var", "a", "bdo", "img", "map", "object", "q", "span", "sub", "sup", "button", "input", "label", "select", "textarea":
return true
default:
return false
}
default:
return false
}
}
func contentIsInline(n *html.Node) bool {
for c := n.FirstChild; c != nil; c = c.NextSibling {
if !isInline(c) || !contentIsInline(c) {
return false
}
}
return true
}
func indent(b *bytes.Buffer, depth int) {
depth *= 2
for i := 0; i < depth; i++ {
b.WriteByte(' ')
}
}
func prettyPrintScript(b *bytes.Buffer, s string, depth int) {
for _, line := range strings.Split(s, "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
depthChange := 0
for _, c := range line {
switch c {
case '(', '[', '{':
depthChange++
case ')', ']', '}':
depthChange--
}
}
switch line[0] {
case '.':
indent(b, depth+1)
case ')', ']', '}':
indent(b, depth-1)
default:
indent(b, depth)
}
depth += depthChange
fmt.Fprintln(b, line)
}
}
希望对你有帮助!
英文:
You could parse the HTML with code.google.com/p/go.net/html, and write your own version of the Render function from that package—one that keeps track of indentation.
But let me warn you: you need to be careful with adding and removing whitespace in HTML. Although whitespace is not usually significant, you can have spaces appearing and disappearing in the rendered text if you're not careful.
Edit:
Here's a pretty-printer function I wrote recently. It handles some of the special cases, but not all of them.
func prettyPrint(b *bytes.Buffer, n *html.Node, depth int) {
switch n.Type {
case html.DocumentNode:
for c := n.FirstChild; c != nil; c = c.NextSibling {
prettyPrint(b, c, depth)
}
case html.ElementNode:
justRender := false
switch {
case n.FirstChild == nil:
justRender = true
case n.Data == "pre" || n.Data == "textarea":
justRender = true
case n.Data == "script" || n.Data == "style":
break
case n.FirstChild == n.LastChild && n.FirstChild.Type == html.TextNode:
if !isInline(n) {
c := n.FirstChild
c.Data = strings.Trim(c.Data, " \t\n\r")
}
justRender = true
case isInline(n) && contentIsInline(n):
justRender = true
}
if justRender {
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
return
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.StartTagToken,
Data: n.Data,
Attr: n.Attr,
})
for c := n.FirstChild; c != nil; c = c.NextSibling {
if n.Data == "script" || n.Data == "style" && c.Type == html.TextNode {
prettyPrintScript(b, c.Data, depth+1)
} else {
prettyPrint(b, c, depth+1)
}
}
indent(b, depth)
fmt.Fprintln(b, html.Token{
Type: html.EndTagToken,
Data: n.Data,
})
case html.TextNode:
n.Data = strings.Trim(n.Data, " \t\n\r")
if n.Data == "" {
return
}
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
default:
indent(b, depth)
html.Render(b, n)
b.WriteByte('\n')
}
}
func isInline(n *html.Node) bool {
switch n.Type {
case html.TextNode, html.CommentNode:
return true
case html.ElementNode:
switch n.Data {
case "b", "big", "i", "small", "tt", "abbr", "acronym", "cite", "dfn", "em", "kbd", "strong", "samp", "var", "a", "bdo", "img", "map", "object", "q", "span", "sub", "sup", "button", "input", "label", "select", "textarea":
return true
default:
return false
}
default:
return false
}
}
func contentIsInline(n *html.Node) bool {
for c := n.FirstChild; c != nil; c = c.NextSibling {
if !isInline(c) || !contentIsInline(c) {
return false
}
}
return true
}
func indent(b *bytes.Buffer, depth int) {
depth *= 2
for i := 0; i < depth; i++ {
b.WriteByte(' ')
}
}
func prettyPrintScript(b *bytes.Buffer, s string, depth int) {
for _, line := range strings.Split(s, "\n") {
line = strings.TrimSpace(line)
if line == "" {
continue
}
depthChange := 0
for _, c := range line {
switch c {
case '(', '[', '{':
depthChange++
case ')', ']', '}':
depthChange--
}
}
switch line[0] {
case '.':
indent(b, depth+1)
case ')', ']', '}':
indent(b, depth-1)
default:
indent(b, depth)
}
depth += depthChange
fmt.Fprintln(b, line)
}
}
答案5
得分: 2
简短回答
使用这个Go的HTML漂亮打印库(我写的,咳咳)。它有一些测试用例,适用于基本输入,并且希望随着时间的推移变得更加健壮,尽管现在它并不是非常健壮。请注意自述文件中的已知问题部分。
长篇回答
对于简单情况,使用code.google.com/p/go.net/html包(上述包就是使用这个包实现的)可以相对容易地自己编写HTML漂亮化程序。以下是使用这种方式实现的一个非常简单的Prettify函数:
func Prettify(raw string, indent string) (pretty string, e error) {
r := strings.NewReader(raw)
z := html.NewTokenizer(r)
pretty = ""
depth := 0
prevToken := html.CommentToken
for {
tt := z.Next()
tokenString := string(z.Raw())
// 去除换行符
if tt == html.TextToken {
stripped := strings.Trim(tokenString, "\n")
if len(stripped) == 0 {
continue
}
}
if tt == html.EndTagToken {
depth -= 1
}
if tt != html.TextToken {
if prevToken != html.TextToken {
pretty += "\n"
for i := 0; i < depth; i++ {
pretty += indent
}
}
}
pretty += tokenString
// 最后一个标记
if tt == html.ErrorToken {
break
} else if tt == html.StartTagToken {
depth += 1
}
prevToken = tt
}
return strings.Trim(pretty, "\n"), nil
}
它可以处理像你提供的示例这样的简单情况。例如,
html := `<!DOCTYPE html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>`
pretty, _ := Prettify(html, " ")
fmt.Println(pretty)
将输出以下内容:
<!DOCTYPE html>
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div class="random-class">
<h1>I like pie</h1>
<p>It's true!</p>
</div>
</body>
</html>
请注意,这种简单方法尚未处理HTML注释,也不能完美处理不符合XHTML标准的HTML5自闭合标签(如<br>
),在应该保留空白字符时不能保证保留空白字符,以及其他一系列我尚未考虑到的边缘情况。只将其用作参考、玩具或起点
英文:
Short answer
Use this HTML prettyprint library for Go (that I wrote, *uhum*). It has some tests and works for basic inputs, and will hopefully become more robust over time, though it isn't very robust right now. Note the Known Issues section in the readme.
Long Answer
Rolling your own HTML prettifier for simple cases is reasonably easy using the code.google.com/p/go.net/html package (that's what the above package does). Here is a very simple Prettify function implemented in this way:
func Prettify(raw string, indent string) (pretty string, e error) {
r := strings.NewReader(raw)
z := html.NewTokenizer(r)
pretty = ""
depth := 0
prevToken := html.CommentToken
for {
tt := z.Next()
tokenString := string(z.Raw())
// strip away newlines
if tt == html.TextToken {
stripped := strings.Trim(tokenString, "\n")
if len(stripped) == 0 {
continue
}
}
if tt == html.EndTagToken {
depth -= 1
}
if tt != html.TextToken {
if prevToken != html.TextToken {
pretty += "\n"
for i := 0; i < depth; i++ {
pretty += indent
}
}
}
pretty += tokenString
// last token
if tt == html.ErrorToken {
break
} else if tt == html.StartTagToken {
depth += 1
}
prevToken = tt
}
return strings.Trim(pretty, "\n"), nil
}
It handles simple examples, like the one you provided. For example,
html := `<!DOCTYPE html><html><head>
<title>Website Title</title>
</head><body>
<div class="random-class">
<h1>I like pie</h1><p>It's true!</p></div>
</body></html>`
pretty, _ := Prettify(html, " ")
fmt.Println(pretty)
will print the following:
<!DOCTYPE html>
<html>
<head>
<title>Website Title</title>
</head>
<body>
<div class="random-class">
<h1>I like pie</h1>
<p>It's true!</p>
</div>
</body>
</html>
Beware though, this simple approach doesn't yet handle HTML comments, nor does it handle perfectly valid self-closing HTML5 tags that are not XHTML-compliant, like <br>
, whitespace is not guaranteed to be preserved when it should, and a whole range of other edge cases I haven't yet thought of. Use it only as a reference, a toy or a starting point
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论