英文:
How to get inner HTML, or just text, of a tag?
问题
我们可以使用golang.org/x/net/html
库来获取下面示例中锚文本的值。在你的代码中,你已经可以通过html.ElementNode
获取href
和title
的值。现在你需要获取文本的值,可以使用以下代码:
// html.ElementNode可以用于获取href和title的值,但无法获取文本值。
if n.Type == html.TextNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
text = a.Val
}
}
}
这段代码将通过遍历n.Attr
来获取href
属性的值,并将其赋给text
变量。这样你就可以得到锚文本的值了。
英文:
How do we get the value of anchor text per the example below? Here is my go code. I can get the value of href
and title
using html.ElementNode
. I need to get the value of text using only golang.org/x/net/html
, with no other libraries.
Example: From <a href="https:xyz.com">Text XYZ</a>
, I want to get "Text XYZ".
// html.ElementNode works for getting href and title value but no text value with TextNode.
if n.Type == html.TextNode && n.Data == "a" {
for _, a := range n.Attr {
if a.Key == "href" {
text = a.Val
}
}
}
答案1
得分: 0
给定以下HTML代码:
<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>
你期望得到的是仅文本内容吗?
Go to example 1
Go to example 2
还是你期望得到内部HTML内容?
Go to <b>example</b>example 1
Go to <b>example</b>example 2
或者,你期望得到其他内容?
以下程序可以提取出文本内容或内部HTML内容。每当它找到一个锚节点时,它会保存该节点,并继续遍历该节点的子树。在遇到其他节点时,它会与保存的节点进行比较,并将TextNode的文本追加到缓冲区中,或将节点的HTML渲染到缓冲区中。最后,在遍历完所有子节点并重新遇到保存的锚节点后,它会打印文本字符串和HTML缓冲区,然后重置两个变量,并将锚节点设置为nil。
我从Golang parse HTML, extract all content with <body> </body> tags中获得了使用缓冲区和html.Render以及保存特定节点的思路。
以下代码也可以在Playground中找到:
package main
import (
"bytes"
"fmt"
"io"
"strings"
"golang.org/x/net/html"
)
func main() {
s := `
<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>
`
doc, _ := html.Parse(strings.NewReader(s))
var nAnchor *html.Node
var sTxt string
var bufInnerHtml bytes.Buffer
w := io.Writer(&bufInnerHtml)
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
nAnchor = n
}
if nAnchor != nil {
if n != nAnchor { // don't write the a tag and its attributes
html.Render(w, n)
}
if n.Type == html.TextNode {
sTxt += n.Data
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
if n == nAnchor {
fmt.Println("Text:", sTxt)
fmt.Println("InnerHTML:", bufInnerHtml.String())
sTxt = ""
bufInnerHtml.Reset()
nAnchor = nil
}
}
f(doc)
}
Text: Go to example 1
InnerHTML: Go to <b>example</b>example 1
Text: Go to example 2
InnerHTML: Go to <b>example</b>example 2
英文:
Given the HTML:
<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>
Do you expect just the text?
Go to example 1
Go to example 2
Do you expect the inner HTML?
Go to <b>example</b>example 1
Go to <b>example</b>example 2
Or, do you expect something else?
The following program gives either just the text or the inner HTML. Every time it finds an anchor node, it saves that node, then continues down that node’s tree. As it encounters other nodes it checks against the saved node and either appends the text of TextNodes or renders the node's HTML to a buffer. Finally, after traversing all the children and re-encountering the saved anchor node, it prints the text string and the HTML buffer, resets both vars, then nils the anchor node.
I got the idea of using a buffer and html.Render, and saving a particular node, from Golang parse HTML, extract all content with <body> </body> tags.
The following is also in the Playground:
package main
import (
"bytes"
"fmt"
"io"
"strings"
"golang.org/x/net/html"
)
func main() {
s := `
<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>
`
doc, _ := html.Parse(strings.NewReader(s))
var nAnchor *html.Node
var sTxt string
var bufInnerHtml bytes.Buffer
w := io.Writer(&bufInnerHtml)
var f func(*html.Node)
f = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "a" {
nAnchor = n
}
if nAnchor != nil {
if n != nAnchor { // don't write the a tag and its attributes
html.Render(w, n)
}
if n.Type == html.TextNode {
sTxt += n.Data
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
f(c)
}
if n == nAnchor {
fmt.Println("Text:", sTxt)
fmt.Println("InnerHTML:", bufInnerHtml.String())
sTxt = ""
bufInnerHtml.Reset()
nAnchor = nil
}
}
f(doc)
}
Text: Go to example 1
InnerHTML: Go to <b>example</b>example 1
Text: Go to example 2
InnerHTML: Go to <b>example</b>example 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论