2022年9月30日 14:48:09go评论101阅读模式

英文:

How to get inner HTML, or just text, of a tag?

问题

我们可以使用golang.org/x/net/html库来获取下面示例中锚文本的值。在你的代码中，你已经可以通过html.ElementNode获取href和title的值。现在你需要获取文本的值，可以使用以下代码：

// html.ElementNode可以用于获取href和title的值，但无法获取文本值。
if n.Type == html.TextNode && n.Data == "a" {
    for _, a := range n.Attr {
        if a.Key == "href" {
            text = a.Val
        }
    }
}

这段代码将通过遍历n.Attr来获取href属性的值，并将其赋给text变量。这样你就可以得到锚文本的值了。

英文:

How do we get the value of anchor text per the example below? Here is my go code. I can get the value of href and title using html.ElementNode. I need to get the value of text using only golang.org/x/net/html, with no other libraries.

Example: From <a href="https:xyz.com">Text XYZ</a>, I want to get "Text XYZ".

// html.ElementNode works for getting href and title value but no text value with TextNode. 
if n.Type == html.TextNode &amp;&amp; n.Data == &quot;a&quot; {
    for _, a := range n.Attr {
        if a.Key == &quot;href&quot; {
            text = a.Val
        }
    }
}

答案1

得分: 0

给定以下HTML代码：

&lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
&lt;p&gt;Some para text&lt;/p&gt;
&lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;

你期望得到的是仅文本内容吗？

Go to example 1
Go to example 2

还是你期望得到内部HTML内容？

Go to &lt;b&gt;example&lt;/b&gt;example 1
Go to &lt;b&gt;example&lt;/b&gt;example 2

或者，你期望得到其他内容？

以下程序可以提取出文本内容或内部HTML内容。每当它找到一个锚节点时，它会保存该节点，并继续遍历该节点的子树。在遇到其他节点时，它会与保存的节点进行比较，并将TextNode的文本追加到缓冲区中，或将节点的HTML渲染到缓冲区中。最后，在遍历完所有子节点并重新遇到保存的锚节点后，它会打印文本字符串和HTML缓冲区，然后重置两个变量，并将锚节点设置为nil。

我从Golang parse HTML, extract all content with <body> </body> tags中获得了使用缓冲区和html.Render以及保存特定节点的思路。

以下代码也可以在Playground中找到：

package main

import (
	"bytes"
	"fmt"
	"io"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	s := `
    &lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
    &lt;p&gt;Some para text&lt;/p&gt;
    &lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;
    `

	doc, _ := html.Parse(strings.NewReader(s))

	var nAnchor *html.Node
	var sTxt string
	var bufInnerHtml bytes.Buffer

	w := io.Writer(&bufInnerHtml)

	var f func(*html.Node)
	f = func(n *html.Node) {
		if n.Type == html.ElementNode && n.Data == "a" {
			nAnchor = n
		}

		if nAnchor != nil {
			if n != nAnchor { // don't write the a tag and its attributes
				html.Render(w, n)
			}
			if n.Type == html.TextNode {
				sTxt += n.Data
			}
		}

		for c := n.FirstChild; c != nil; c = c.NextSibling {
			f(c)
		}

		if n == nAnchor {
			fmt.Println("Text:", sTxt)
			fmt.Println("InnerHTML:", bufInnerHtml.String())
			sTxt = ""
			bufInnerHtml.Reset()
			nAnchor = nil
		}
	}
	f(doc)
}

Text: Go to example 1
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 1
Text: Go to example 2
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 2

英文:

Given the HTML:

&lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
&lt;p&gt;Some para text&lt;/p&gt;
&lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;

Do you expect just the text?

Go to example 1
Go to example 2

Do you expect the inner HTML?

Go to &lt;b&gt;example&lt;/b&gt;example 1
Go to &lt;b&gt;example&lt;/b&gt;example 2

Or, do you expect something else?

The following program gives either just the text or the inner HTML. Every time it finds an anchor node, it saves that node, then continues down that node’s tree. As it encounters other nodes it checks against the saved node and either appends the text of TextNodes or renders the node's HTML to a buffer. Finally, after traversing all the children and re-encountering the saved anchor node, it prints the text string and the HTML buffer, resets both vars, then nils the anchor node.

I got the idea of using a buffer and html.Render, and saving a particular node, from Golang parse HTML, extract all content with <body> </body> tags.

The following is also in the Playground:

package main

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;strings&quot;

	&quot;golang.org/x/net/html&quot;
)

func main() {
	s := `
    &lt;a href=&quot;http://example.com/1&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 1&lt;/a&gt;
    &lt;p&gt;Some para text&lt;/p&gt;
    &lt;a href=&quot;http://example.com/2&quot;&gt;Go to &lt;b&gt;example&lt;/b&gt; 2&lt;/a&gt;
    `

	doc, _ := html.Parse(strings.NewReader(s))

	var nAnchor *html.Node
	var sTxt string
	var bufInnerHtml bytes.Buffer

	w := io.Writer(&amp;bufInnerHtml)

	var f func(*html.Node)
	f = func(n *html.Node) {
		if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;a&quot; {
			nAnchor = n
		}

		if nAnchor != nil {
			if n != nAnchor { // don&#39;t write the a tag and its attributes
				html.Render(w, n)
			}
			if n.Type == html.TextNode {
				sTxt += n.Data
			}
		}

		for c := n.FirstChild; c != nil; c = c.NextSibling {
			f(c)
		}

		if n == nAnchor {
			fmt.Println(&quot;Text:&quot;, sTxt)
			fmt.Println(&quot;InnerHTML:&quot;, bufInnerHtml.String())
			sTxt = &quot;&quot;
			bufInnerHtml.Reset()
			nAnchor = nil
		}
	}
	f(doc)
}

Text: Go to example 1
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 1
Text: Go to example 2
InnerHTML: Go to &lt;b&gt;example&lt;/b&gt;example 2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何获取标签的内部HTML或纯文本？

问题

答案1

如何在html/template中使用索引来遍历并行数组？

如何将UTC时间转换为Unix时间戳

Golang对Go协程的排序输出

How to set header key and value with go packages : shurcooL/graphql or hasura/go-graphql-client?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论