英文:
How can i get the content of an html.Node
问题
我可以帮你翻译这段代码。这段代码使用了第三方库GO,从http://godoc.org/code.google.com/p/go.net/html获取URL的数据。但是我遇到了一个问题,就是无法获取html.Node的内容。
在参考文档中有一个示例代码,以下是代码:
s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
输出结果为:
foo
/bar/baz
如果你想要获取:
Foo
BarBaz
你应该怎么做呢?
英文:
I would like to get data from a URL using the GO 3rd party library from http://godoc.org/code.google.com/p/go.net/html . But I came across a problem, that is I couldn't get the content of an html.Node.
There's an example code in the reference document, and here's the code.
s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
The output is:
foo
/bar/baz
If I want to get
Foo
BarBaz
What should I do?
答案1
得分: 10
<a href="link"><strong>Foo</strong>Bar</a>的树形结构基本上是这样的:
- ElementNode "a"(该节点还包括属性列表)
- ElementNode "strong"
- TextNode "Foo"
- TextNode "Bar"
- ElementNode "strong"
所以,假设你想要获取链接的纯文本(例如FooBar),你需要遍历整个树并收集所有的文本节点。例如:
func collectText(n *html.Node, buf *bytes.Buffer) {
if n.Type == html.TextNode {
buf.WriteString(n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
collectText(c, buf)
}
}
然后在你的函数中进行以下更改:
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
text := &bytes.Buffer{}
collectText(n, text)
fmt.Println(text)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
英文:
The tree of <a href="link"><strong>Foo</strong>Bar</a> looks basically like this:
- ElementNode "a" (this node also includes a list off attributes)
- ElementNode "strong"
- TextNode "Foo"
- TextNode "Bar"
- ElementNode "strong"
So, assuming that you want to get the plain text of the link (e.g. FooBar) you would have to walk trough the tree and collect all text nodes. For example:
func collectText(n *html.Node, buf *bytes.Buffer) {
if n.Type == html.TextNode {
buf.WriteString(n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
collectText(c, buf)
}
}
And the changes in your function:
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
text := &bytes.Buffer{}
collectText(n, text)
fmt.Println(text)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论