2017年2月24日 20:56:36go评论166阅读模式

英文:

Why does this program not print anything?

问题

我正在尝试使用Go语言解析HTML。我想将HTML打印到终端，但我不明白为什么这段代码不会打印任何内容：

package main

import (
    "fmt"
    "log"
    "net/http"

    "golang.org/x/net/html"
)

func main() {
    r, err := http.Get("https://google.com")
    if err != nil {
        log.Panicln(err)
    }

    defer func() {
        err := r.Body.Close()
        if err != nil {
            fmt.Println(err)
        }
    }()

    node, err := html.Parse(r.Body)
    if err != nil {
        log.Panicln(err)
    }
    fmt.Println(node.Data)
}

我知道有不同的方法可以打印HTML，但我不明白为什么这段代码无论使用哪个网站都不会打印任何内容。这是预期的行为吗？

文档：

https://godoc.org/golang.org/x/net/html#Node

https://github.com/golang/net/blob/master/html/node.go#L38

英文:

I'm trying to use Go to parse html. I would like to print the html to the terminal and I don't understand why this doesn't print anything:

package main

import (
        &quot;fmt&quot;
        &quot;log&quot;
        &quot;net/http&quot;

        &quot;golang.org/x/net/html&quot;
)

func main() {
        r, err := http.Get(&quot;https://google.com&quot;)
        if err != nil {
                log.Panicln(err)
        }

        defer func() {
                err := r.Body.Close()
                if err != nil {
                        fmt.Println(err)
                }
        }()

        node, err := html.Parse(r.Body)
        if err != nil {
                log.Panicln(err)
        }
        fmt.Println(node.Data)
}

I know there are different ways to print the html, but I don't understand why this in particular never prints anything no matter what website I use. Is this intended behavior?

Docs:

https://godoc.org/golang.org/x/net/html#Node

https://github.com/golang/net/blob/master/html/node.go#L38

答案1

得分: 2

因为它是HTML的树形结构。上层是空的。
例如，如果你需要解析HTML中的所有URL：

package main

import (
	"fmt"
	"log"
	"net/http"

	"golang.org/x/net/html"
)

func main() {
	r, err := http.Get("https://google.com")
	if err != nil {
		log.Panicln(err)
	}

	defer func() {
		err := r.Body.Close()
		if err != nil {
			fmt.Println(err)
		}
	}()

	node, err := html.Parse(r.Body)
	if err != nil {
		log.Panicln(err)
	}
	fmt.Println(node.Data)

	var f func(*html.Node)
	f = func(n *html.Node) {
		if n.Type == html.ElementNode && n.Data == "a" {
			for _, a := range n.Attr {
				if a.Key == "href" {
					fmt.Println(a.Val)
					break
				}
			}
		}
		for c := n.FirstChild; c != nil; c = c.NextSibling {
			f(c)
		}
	}
	f(node)
}

希望对你有帮助！

英文:

Because it's a tree of the HTML. Upper level is empty.
For example if you need parse all url from html:

package main

import (
        &quot;fmt&quot;
        &quot;log&quot;
        &quot;net/http&quot;

        &quot;golang.org/x/net/html&quot;
)

func main() {
        r, err := http.Get(&quot;https://google.com&quot;)
        if err != nil {
                log.Panicln(err)
        }

        defer func() {
                err := r.Body.Close()
                if err != nil {
                        fmt.Println(err)
                }
        }()

        node, err := html.Parse(r.Body)
        if err != nil {
                log.Panicln(err)
        }
        fmt.Println(node.Data)

		var f func(*html.Node)
		f = func(n *html.Node) {
			if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
				for _, a := range n.Attr {
					if a.Key == &quot;href&quot; {
						fmt.Println(a.Val)
						break
					}
				}
			}
			for c := n.FirstChild; c != nil; c = c.NextSibling {
				f(c)
			}
		}
		f(node)
}

答案2

得分: 2

这是一个关于如何遍历HTML树的简单示例代码。代码中使用了golang.org/x/net/html包来解析HTML，并通过遍历节点的方式打印节点类型和数据。

代码的输出如下：

NodeType=DocumentNode Data=
NodeType=ElementNode Data=html
NodeType=ElementNode Data=head
NodeType=ElementNode Data=body
NodeType=ElementNode Data=p
NodeType=TextNode Data=Some content

这表示HTML树的结构如下：

DocumentNode
- ElementNode (html)
  - ElementNode (head)
  - ElementNode (body)
    - ElementNode (p)
      - TextNode ("Some content")

英文:

It is because html.Parse returns a tree of connected nodes. And the root node is of type "document" which has no data inside of it.

Simplistic example of how to walk the tree:

package main

import (
    &quot;fmt&quot;
    &quot;golang.org/x/net/html&quot;
    &quot;strings&quot;

)

func nodeTypeAsString(nodeType html.NodeType) string{
    switch(nodeType){
    case html.ErrorNode : return &quot;ErrorNode&quot;
    case html.TextNode : return &quot;TextNode&quot;
    case html.DocumentNode : return &quot;DocumentNode&quot;
    case html.ElementNode : return &quot;ElementNode&quot;
    case html.CommentNode : return &quot;CommentNode&quot;
    case html.DoctypeNode: return  &quot;DoctypeNode&quot;
    }
    return &quot;UNKNOWN&quot;
}

func main() {
    s := &quot;&lt;html&gt;&lt;body&gt;&lt;p&gt;Some content&lt;/p&gt;&lt;/body&gt;&lt;/html&gt;&quot;
    node, err := html.Parse(strings.NewReader(s))
    if err != nil {
        panic(err.Error())
    }

    // Root node
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
    // Step deeper
    node = node.FirstChild
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
    // Step deeper
    node = node.FirstChild
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
    // Step over to sibling
    node = node.NextSibling
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
    // Step deeper
    node = node.FirstChild
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
    // Step deeper
    node = node.FirstChild
    fmt.Printf(&quot;NodeType=%s Data=%s\n&quot;,nodeTypeAsString(node.Type),node.Data)
}

OUTPUT:

NodeType=DocumentNode Data=
NodeType=ElementNode Data=html
NodeType=ElementNode Data=head
NodeType=ElementNode Data=body
NodeType=ElementNode Data=p
NodeType=TextNode Data=Some content

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么这个程序没有打印任何内容？

问题

答案1

答案2

How can we truncate float64 type to a particular precision?

How can I add my license to the top of files autogenerated by openapi-generator?

Golang中使用net.Pipe的io.Reader用法

mysql_real_escape_string equivalent for Golang

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论