2013年8月16日 21:29:01go评论125阅读模式

英文:

How can i get the content of an html.Node

问题

我可以帮你翻译这段代码。这段代码使用了第三方库GO，从http://godoc.org/code.google.com/p/go.net/html获取URL的数据。但是我遇到了一个问题，就是无法获取html.Node的内容。

在参考文档中有一个示例代码，以下是代码：

s := `&lt;p&gt;Links:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;foo&quot;&gt;Foo&lt;/a&gt;&lt;li&gt;&lt;a href=&quot;/bar/baz&quot;&gt;BarBaz&lt;/a&gt;&lt;/ul&gt;`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
    log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
    if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
        for _, a := range n.Attr {
            if a.Key == &quot;href&quot; {
                fmt.Println(a.Val)
                break
            }
        }
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        f(c)
    }
}
f(doc)

输出结果为：

foo
/bar/baz

如果你想要获取：

Foo
BarBaz

你应该怎么做呢？

英文:

I would like to get data from a URL using the GO 3rd party library from http://godoc.org/code.google.com/p/go.net/html . But I came across a problem, that is I couldn't get the content of an html.Node.

There's an example code in the reference document, and here's the code.

s := `&lt;p&gt;Links:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;a href=&quot;foo&quot;&gt;Foo&lt;/a&gt;&lt;li&gt;&lt;a href=&quot;/bar/baz&quot;&gt;BarBaz&lt;/a&gt;&lt;/ul&gt;`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
    log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
    if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
        for _, a := range n.Attr {
            if a.Key == &quot;href&quot; {
                fmt.Println(a.Val)
                break
            }
        }
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        f(c)
    }
}
f(doc)

The output is:

foo
/bar/baz

If I want to get

Foo
BarBaz

What should I do?

答案1

得分: 10

<a href="link"><strong>Foo</strong>Bar</a>的树形结构基本上是这样的：

ElementNode "a"（该节点还包括属性列表）
- ElementNode "strong"
  - TextNode "Foo"
- TextNode "Bar"

所以，假设你想要获取链接的纯文本（例如FooBar），你需要遍历整个树并收集所有的文本节点。例如：

func collectText(n *html.Node, buf *bytes.Buffer) {
    if n.Type == html.TextNode {
        buf.WriteString(n.Data)
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        collectText(c, buf)
    }
}

然后在你的函数中进行以下更改：

var f func(*html.Node)
f = func(n *html.Node) {
    if n.Type == html.ElementNode && n.Data == "a" {
        text := &bytes.Buffer{}
        collectText(n, text)
        fmt.Println(text)
    }
    for c := n.FirstChild; c != nil; c = c.NextSibling {
        f(c)
    }
}

英文:

The tree of <a href="link"><strong>Foo</strong>Bar</a> looks basically like this:

ElementNode "a" (this node also includes a list off attributes)
- ElementNode "strong"
  - TextNode "Foo"
- TextNode "Bar"

So, assuming that you want to get the plain text of the link (e.g. FooBar) you would have to walk trough the tree and collect all text nodes. For example:

func collectText(n *html.Node, buf *bytes.Buffer) {
	if n.Type == html.TextNode {
		buf.WriteString(n.Data)
	}
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		collectText(c, buf)
	}
}

And the changes in your function:

var f func(*html.Node)
f = func(n *html.Node) {
	if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
		text := &amp;bytes.Buffer{}
		collectText(n, text)
		fmt.Println(text)
	}
	for c := n.FirstChild; c != nil; c = c.NextSibling {
		f(c)
	}
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取html.Node的内容？

问题

答案1

如何指定多个返回值的类型

一些关于QUIC-GO示例服务器的问题

使用Alice和HttpRouter的中间件

在Go 1.18中，”any”类型是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。