英文:
html.Parse function returns nil instead of parsed html
问题
我开始学习Go,并尝试运行这个程序,但是来自golang.org/x/net/html的html.Parse在我尝试获取解析后的HTML时返回nil。我尝试了不同的方法,但是我无法找出问题出在哪里,所以如果有人能解释一下内部发生了什么,我将不胜感激。
我正在使用Go版本1.13.8,我的操作系统是Ubuntu 20.4 LTS。
当我打印doc时,我得到以下消息:
&{<nil> 0xc0000ca070 0xc0000ca0e0 <nil> <nil> 2 []}
英文:
I started to learn Go and I try to run this program but html.Parse from golang.org/x/net/html returns nil when I try to get parsed HTML. I try different things but I can't find out what's going on, so I appreciate it if someone explains what happens under the hood, thanks.
package main
import (
"fmt"
"os"
"golang.org/x/net/html"
)
func main() {
doc, err := html.Parse(os.Stdin)
if err != nil {
fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
os.Exit(1)
}
fmt.Println(doc)
for _, link := range visit(nil, doc) {
fmt.Printf("link is %v", link)
fmt.Println(link)
}
func visit(links []string, n *html.Node) []string {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
links = append(links, a.Val)
}
}
}
if c := n.FirstChild; c != nil {
c = c.NextSibling
links = visit(links, c)
}
return links
}
I'm using go version 1.13.8 and my operating system is Ubuntu 20.4 LTS.
When I print doc I get this message:
&{<nil> 0xc0000ca070 0xc0000ca0e0 <nil> <nil> 2 []}
答案1
得分: 2
您的解析文档不是nil
,否则您只会看到打印的nil
,而不是类似&{...}
的内容。
访问所有子节点是一个循环,但您只检查n
节点是否有第一个子节点,如果有,您甚至不使用它,而是遍历下一个兄弟节点。这没有意义。
要访问所有子节点,请使用以下循环:
for c := n.FirstChild; c != nil; c = c.NextSibling {
links = visit(links, c)
}
进行测试:
s := `<a href="http://first.com">first</a><b><a href="http://second.com">second</a></b>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
os.Exit(1)
}
fmt.Println(doc)
for _, link := range visit(doc) {
fmt.Println("link is", link)
}
输出结果(在Go Playground上尝试):
&{<nil> 0xc00012e070 0xc00012e070 <nil> <nil> 2 []}
link is http://first.com
link is http://second.com
英文:
Your parsed document isn't nil
, else you'd only see printed nil
and not something like &{...}
.
Visiting all children is a loop, yet you only check if the n
node has a first child, and if it does, you don't even use it but traverse the next sibling. This makes no sense.
To visit all children, use a loop like this:
for c := n.FirstChild; c != nil; c = c.NextSibling {
links = visit(links, c)
}
Testing it:
s := `<a href="http://first.com">first</a><b><a href="http://second.com">second</a></b>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
os.Exit(1)
}
fmt.Println(doc)
for _, link := range visit(doc) {
fmt.Println("link is", link)
}
Which outputs (try it on the Go Playground):
&{<nil> 0xc00012e070 0xc00012e070 <nil> <nil> 2 []}
link is http://first.com
link is http://second.com
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论