英文:
How can i get the content of an html.Node
问题
我可以帮你翻译这段代码。这段代码使用了第三方库GO
,从http://godoc.org/code.google.com/p/go.net/html获取URL的数据。但是我遇到了一个问题,就是无法获取html.Node的内容。
在参考文档中有一个示例代码,以下是代码:
s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
输出结果为:
foo
/bar/baz
如果你想要获取:
Foo
BarBaz
你应该怎么做呢?
英文:
I would like to get data from a URL using the GO
3rd party library from http://godoc.org/code.google.com/p/go.net/html . But I came across a problem, that is I couldn't get the content of an html.Node.
There's an example code in the reference document, and here's the code.
s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`
doc, err := html.Parse(strings.NewReader(s))
if err != nil {
log.Fatal(err)
}
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
fmt.Println(a.Val)
break
}
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
f(doc)
The output is:
foo
/bar/baz
If I want to get
Foo
BarBaz
What should I do?
答案1
得分: 10
<a href="link"><strong>Foo</strong>Bar</a>
的树形结构基本上是这样的:
- ElementNode "a"(该节点还包括属性列表)
- ElementNode "strong"
- TextNode "Foo"
- TextNode "Bar"
- ElementNode "strong"
所以,假设你想要获取链接的纯文本(例如FooBar
),你需要遍历整个树并收集所有的文本节点。例如:
func collectText(n *html.Node, buf *bytes.Buffer) {
if n.Type == html.TextNode {
buf.WriteString(n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
collectText(c, buf)
}
}
然后在你的函数中进行以下更改:
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
text := &bytes.Buffer{}
collectText(n, text)
fmt.Println(text)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
英文:
The tree of <a href="link"><strong>Foo</strong>Bar</a>
looks basically like this:
- ElementNode "a" (this node also includes a list off attributes)
- ElementNode "strong"
- TextNode "Foo"
- TextNode "Bar"
- ElementNode "strong"
So, assuming that you want to get the plain text of the link (e.g. FooBar
) you would have to walk trough the tree and collect all text nodes. For example:
func collectText(n *html.Node, buf *bytes.Buffer) {
if n.Type == html.TextNode {
buf.WriteString(n.Data)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
collectText(c, buf)
}
}
And the changes in your function:
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
text := &bytes.Buffer{}
collectText(n, text)
fmt.Println(text)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论