2021年5月28日 14:40:37go评论100阅读模式

英文:

html.Parse function returns nil instead of parsed html

问题

我开始学习Go，并尝试运行这个程序，但是来自golang.org/x/net/html的html.Parse在我尝试获取解析后的HTML时返回nil。我尝试了不同的方法，但是我无法找出问题出在哪里，所以如果有人能解释一下内部发生了什么，我将不胜感激。

我正在使用Go版本1.13.8，我的操作系统是Ubuntu 20.4 LTS。

当我打印doc时，我得到以下消息：

&amp;{&lt;nil&gt; 0xc0000ca070 0xc0000ca0e0 &lt;nil&gt; &lt;nil&gt; 2    []}

英文:

I started to learn Go and I try to run this program but html.Parse from golang.org/x/net/html returns nil when I try to get parsed HTML. I try different things but I can't find out what's going on, so I appreciate it if someone explains what happens under the hood, thanks.

package main

import (
	&quot;fmt&quot;
	&quot;os&quot;
    &quot;golang.org/x/net/html&quot;
)

func main() {
	doc, err := html.Parse(os.Stdin)
	if err != nil {
		fmt.Fprintf(os.Stderr, &quot;findlinks1: %v\n&quot;, err)
		os.Exit(1)
	}
	fmt.Println(doc)
	for _, link := range visit(nil, doc) {
		fmt.Printf(&quot;link is %v&quot;, link)
		fmt.Println(link)
	}

func visit(links []string, n *html.Node) []string {
	if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
		for _, a := range n.Attr {
			if a.Key == &quot;href&quot; {
				links = append(links, a.Val)
			}
		}
	}
	if c := n.FirstChild; c != nil {
		c = c.NextSibling
		links = visit(links, c)
	}
	return links
}

I'm using go version 1.13.8 and my operating system is Ubuntu 20.4 LTS.
When I print doc I get this message:

&amp;{&lt;nil&gt; 0xc0000ca070 0xc0000ca0e0 &lt;nil&gt; &lt;nil&gt; 2    []}

答案1

得分: 2

您的解析文档不是nil，否则您只会看到打印的nil，而不是类似&{...}的内容。

访问所有子节点是一个循环，但您只检查n节点是否有第一个子节点，如果有，您甚至不使用它，而是遍历下一个兄弟节点。这没有意义。

要访问所有子节点，请使用以下循环：

for c := n.FirstChild; c != nil; c = c.NextSibling {
    links = visit(links, c)
}

进行测试：

s := `&lt;a href=&quot;http://first.com&quot;&gt;first&lt;/a&gt;&lt;b&gt;&lt;a href=&quot;http://second.com&quot;&gt;second&lt;/a&gt;&lt;/b&gt;`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
    fmt.Fprintf(os.Stderr, "findlinks1: %v\n", err)
    os.Exit(1)
}
fmt.Println(doc)
for _, link := range visit(doc) {
    fmt.Println("link is", link)
}

输出结果（在Go Playground上尝试）：

&amp;{&lt;nil&gt; 0xc00012e070 0xc00012e070 &lt;nil&gt; &lt;nil&gt; 2    []}
link is http://first.com
link is http://second.com

英文:

Your parsed document isn't nil, else you'd only see printed nil and not something like &{...}.

Visiting all children is a loop, yet you only check if the n node has a first child, and if it does, you don't even use it but traverse the next sibling. This makes no sense.

To visit all children, use a loop like this:

for c := n.FirstChild; c != nil; c = c.NextSibling {
	links = visit(links, c)
}

Testing it:

s := `&lt;a href=&quot;http://first.com&quot;&gt;first&lt;/a&gt;&lt;b&gt;&lt;a href=&quot;http://second.com&quot;&gt;second&lt;/a&gt;&lt;/b&gt;`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
	fmt.Fprintf(os.Stderr, &quot;findlinks1: %v\n&quot;, err)
	os.Exit(1)
}
fmt.Println(doc)
for _, link := range visit(doc) {
	fmt.Println(&quot;link is&quot;, link)
}

Which outputs (try it on the Go Playground):

&amp;{&lt;nil&gt; 0xc00012e070 0xc00012e070 &lt;nil&gt; &lt;nil&gt; 2    []}
link is http://first.com
link is http://second.com

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

html.Parse函数返回nil而不是解析后的HTML。

问题

答案1

如何在golang中测试映射对象

使用Golang进行可选字段的JSON模式验证

xml.NewDecoder(resp.Body).Decode给出EOF错误_GOLang

你可以使用Kubernetes Go库创建一个简单的客户端应用程序。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论