如何获取标签的内部HTML或纯文本?

huangapple go评论89阅读模式
英文:

How to get inner HTML, or just text, of a tag?

问题

我们可以使用golang.org/x/net/html库来获取下面示例中锚文本的值。在你的代码中,你已经可以通过html.ElementNode获取hreftitle的值。现在你需要获取文本的值,可以使用以下代码:

// html.ElementNode可以用于获取href和title的值,但无法获取文本值。
if n.Type == html.TextNode && n.Data == "a" {
    for _, a := range n.Attr {
        if a.Key == "href" {
            text = a.Val
        }
    }
}

这段代码将通过遍历n.Attr来获取href属性的值,并将其赋给text变量。这样你就可以得到锚文本的值了。

英文:

How do we get the value of anchor text per the example below? Here is my go code. I can get the value of href and title using html.ElementNode. I need to get the value of text using only golang.org/x/net/html, with no other libraries.

Example: From <a href="https:xyz.com">Text XYZ</a>, I want to get "Text XYZ".

// html.ElementNode works for getting href and title value but no text value with TextNode. 
if n.Type == html.TextNode && n.Data == "a" {
    for _, a := range n.Attr {
        if a.Key == "href" {
            text = a.Val
        }
    }
}

答案1

得分: 0

给定以下HTML代码:

<a href="http://example.com/1">Go to <b>example</b> 1</a>
<p>Some para text</p>
<a href="http://example.com/2">Go to <b>example</b> 2</a>

你期望得到的是仅文本内容吗?

Go to example 1
Go to example 2

还是你期望得到内部HTML内容?

Go to <b>example</b>example 1
Go to <b>example</b>example 2

或者,你期望得到其他内容?

以下程序可以提取出文本内容或内部HTML内容。每当它找到一个锚节点时,它会保存该节点,并继续遍历该节点的子树。在遇到其他节点时,它会与保存的节点进行比较,并将TextNode的文本追加到缓冲区中,或将节点的HTML渲染到缓冲区中。最后,在遍历完所有子节点并重新遇到保存的锚节点后,它会打印文本字符串和HTML缓冲区,然后重置两个变量,并将锚节点设置为nil。

我从Golang parse HTML, extract all content with <body> </body> tags中获得了使用缓冲区和html.Render以及保存特定节点的思路。

以下代码也可以在Playground中找到:

package main

import (
	"bytes"
	"fmt"
	"io"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	s := `
    &lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
    &lt;p&gt;Some para text&lt;/p&gt;
    &lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;
    `

	doc, _ := html.Parse(strings.NewReader(s))

	var nAnchor *html.Node
	var sTxt string
	var bufInnerHtml bytes.Buffer

	w := io.Writer(&bufInnerHtml)

	var f func(*html.Node)
	f = func(n *html.Node) {
		if n.Type == html.ElementNode && n.Data == "a" {
			nAnchor = n
		}

		if nAnchor != nil {
			if n != nAnchor { // don't write the a tag and its attributes
				html.Render(w, n)
			}
			if n.Type == html.TextNode {
				sTxt += n.Data
			}
		}

		for c := n.FirstChild; c != nil; c = c.NextSibling {
			f(c)
		}

		if n == nAnchor {
			fmt.Println("Text:", sTxt)
			fmt.Println("InnerHTML:", bufInnerHtml.String())
			sTxt = ""
			bufInnerHtml.Reset()
			nAnchor = nil
		}
	}
	f(doc)
}
Text: Go to example 1
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 1
Text: Go to example 2
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 2
英文:

Given the HTML:

&lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
&lt;p&gt;Some para text&lt;/p&gt;
&lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;

Do you expect just the text?

Go to example 1
Go to example 2

Do you expect the inner HTML?

Go to &lt;b&gt;example&lt;/b&gt;example 1
Go to &lt;b&gt;example&lt;/b&gt;example 2

Or, do you expect something else?

The following program gives either just the text or the inner HTML. Every time it finds an anchor node, it saves that node, then continues down that node’s tree. As it encounters other nodes it checks against the saved node and either appends the text of TextNodes or renders the node's HTML to a buffer. Finally, after traversing all the children and re-encountering the saved anchor node, it prints the text string and the HTML buffer, resets both vars, then nils the anchor node.

I got the idea of using a buffer and html.Render, and saving a particular node, from Golang parse HTML, extract all content with <body> </body> tags.

The following is also in the Playground:

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;strings&quot;

	&quot;golang.org/x/net/html&quot;
)

func main() {
	s := `
    &lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
    &lt;p&gt;Some para text&lt;/p&gt;
    &lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;
    `

	doc, _ := html.Parse(strings.NewReader(s))

	var nAnchor *html.Node
	var sTxt string
	var bufInnerHtml bytes.Buffer

	w := io.Writer(&amp;bufInnerHtml)

	var f func(*html.Node)
	f = func(n *html.Node) {
		if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
			nAnchor = n
		}

		if nAnchor != nil {
			if n != nAnchor { // don&#39;t write the a tag and its attributes
				html.Render(w, n)
			}
			if n.Type == html.TextNode {
				sTxt += n.Data
			}
		}

		for c := n.FirstChild; c != nil; c = c.NextSibling {
			f(c)
		}

		if n == nAnchor {
			fmt.Println(&quot;Text:&quot;, sTxt)
			fmt.Println(&quot;InnerHTML:&quot;, bufInnerHtml.String())
			sTxt = &quot;&quot;
			bufInnerHtml.Reset()
			nAnchor = nil
		}
	}
	f(doc)
}
Text: Go to example 1
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 1
Text: Go to example 2
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 2

huangapple
  • 本文由 发表于 2022年9月30日 14:48:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/73904960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定