英文:
Why does this program not print anything?
问题
我正在尝试使用Go语言解析HTML。我想将HTML打印到终端,但我不明白为什么这段代码不会打印任何内容:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
}
我知道有不同的方法可以打印HTML,但我不明白为什么这段代码无论使用哪个网站都不会打印任何内容。这是预期的行为吗?
文档:
https://godoc.org/golang.org/x/net/html#Node
https://github.com/golang/net/blob/master/html/node.go#L38
英文:
I'm trying to use Go to parse html. I would like to print the html to the terminal and I don't understand why this doesn't print anything:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
}
I know there are different ways to print the html, but I don't understand why this in particular never prints anything no matter what website I use. Is this intended behavior?
Docs:
答案1
得分: 2
因为它是HTML的树形结构。上层是空的。
例如,如果你需要解析HTML中的所有URL:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(node)
}
希望对你有帮助!
英文:
Because it's a tree of the HTML. Upper level is empty.
For example if you need parse all url from html:
package main
import (
"fmt"
"log"
"net/http"
"golang.org/x/net/html"
)
func main() {
r, err := http.Get("https://google.com")
if err != nil {
log.Panicln(err)
}
defer func() {
err := r.Body.Close()
if err != nil {
fmt.Println(err)
}
}()
node, err := html.Parse(r.Body)
if err != nil {
log.Panicln(err)
}
fmt.Println(node.Data)
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(node)
}
答案2
得分: 2
这是一个关于如何遍历HTML树的简单示例代码。代码中使用了golang.org/x/net/html
包来解析HTML,并通过遍历节点的方式打印节点类型和数据。
代码的输出如下:
NodeType=DocumentNode Data=
NodeType=ElementNode Data=html
NodeType=ElementNode Data=head
NodeType=ElementNode Data=body
NodeType=ElementNode Data=p
NodeType=TextNode Data=Some content
这表示HTML树的结构如下:
- DocumentNode
- ElementNode (html)
- ElementNode (head)
- ElementNode (body)
- ElementNode (p)
- TextNode ("Some content")
- ElementNode (p)
- ElementNode (html)
英文:
It is because html.Parse returns a tree of connected nodes. And the root node is of type "document" which has no data inside of it.
Simplistic example of how to walk the tree:
package main
import (
"fmt"
"golang.org/x/net/html"
"strings"
)
func nodeTypeAsString(nodeType html.NodeType) string{
switch(nodeType){
case html.ErrorNode : return "ErrorNode"
case html.TextNode : return "TextNode"
case html.DocumentNode : return "DocumentNode"
case html.ElementNode : return "ElementNode"
case html.CommentNode : return "CommentNode"
case html.DoctypeNode: return "DoctypeNode"
}
return "UNKNOWN"
}
func main() {
s := "<html><body><p>Some content</p></body></html>"
node, err := html.Parse(strings.NewReader(s))
if err != nil {
panic(err.Error())
}
// Root node
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
// Step deeper
node = node.FirstChild
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
// Step deeper
node = node.FirstChild
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
// Step over to sibling
node = node.NextSibling
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
// Step deeper
node = node.FirstChild
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
// Step deeper
node = node.FirstChild
fmt.Printf("NodeType=%s Data=%s\n",nodeTypeAsString(node.Type),node.Data)
}
OUTPUT:
NodeType=DocumentNode Data=
NodeType=ElementNode Data=html
NodeType=ElementNode Data=head
NodeType=ElementNode Data=body
NodeType=ElementNode Data=p
NodeType=TextNode Data=Some content
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论