2014年10月1日 10:50:30go评论181阅读模式

英文:

HTML - find all the sub-tags in a given tag

问题

假设我有一个包含以下内容的HTML页面：

<ul class="good">
    <li>1</li>
    <li>2</li>
    <li>3</li>
</ul>

<ul class="bad">
    <li>a</li>
    <li>b</li>
    <li>c</li>
</ul>

我想获取第一个<ul>标签内的<li>元素。我从这里基本上复制了代码（注意：根据@twotwotwo的评论进行了编辑）：

page, _ := html.Parse(httpBody)
var f func(*html.Node)
f = func(n *html.Node) {
    //fmt.Println("Inside f")
    if n.Type == html.ElementNode && n.Data == "ul" {
        fmt.Println("ul found -> ",n)
        for c := n.FirstChild; c != nil; c = c.NextSibling {
            f(c)
        }
    } else {
        fmt.Println(n.Data ,"is not the correct one")
        for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) }
    }
}
f(page)

但是我只得到了以下输出：

 is not the correct one
html is not the correct one
head is not the correct one
body is not the correct one

我想知道为什么递归在body处停止。我尝试过使用母狗网站，它在body内有标签。

P.S.
我还尝试过：

page := html.NewTokenizer(httpBody)

for {
    tokenType := page.Next()
    if tokenType == html.ErrorToken {
        return links
    }
    token := page.Token()

但是这似乎显示了所有的标记，而不关心树结构。

编辑：

英文:

Assume I have a html page that contains something like

&lt;ul class =&quot;good&quot;&gt;
    &lt;li&gt;1&lt;/li&gt;
    &lt;li&gt;2&lt;/li&gt;
    &lt;li&gt;3&lt;/li&gt;
&lt;/ul&gt;

&lt;ul class =&quot;bad&quot;&gt;
    &lt;li&gt;a&lt;/li&gt;
    &lt;li&gt;b&lt;/li&gt;
    &lt;li&gt;c&lt;/li&gt;
&lt;/ul&gt;

I want to grab the <li> elements inside the first <ul>. From here I have basically copied (note: edited code per @twotwotwo comment)

page, _ := html.Parse(httpBody)
	var f func(*html.Node)
	f = func(n *html.Node) {
		//fmt.Println(&quot;Inside f&quot;)
		if n.Type == html.ElementNode &amp;&amp; n.Data == &quot;ul&quot; {
			fmt.Println(&quot;ul found -&gt;  &quot;,n)
			for c := n.FirstChild; c != nil; c = c.NextSibling {
				f(c)
			}
		} else {
          fmt.Println(n.Data ,&quot;is not the correct one&quot;)
          for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) }
          }
	}
f(page)

But the only output I obtain is

 is not the correct one
html is not the correct one
head is not the correct one
body is not the correct one

I wonder why the recursion stops at body. I have tried with motherfuckingwebsite.com which has tags inside the body

P.S.
I have also tried

page := html.NewTokenizer(httpBody)

for {
    tokenType := page.Next()
    if tokenType == html.ErrorToken {
        return links
    }
    token := page.Token()

but this seem to show all the tokens, without caring about the tree structure.

EDIT:

答案1

得分: 4

我过去使用过这个包：https://github.com/PuerkitoBio/goquery

它提供了一个类似于 jQuery 的接口，可以在 HTML 文档中进行查询。使用该库非常简单，就像这样：

import (
	"bytes"
	"fmt"
	"log"

	"github.com/PuerkitoBio/goquery"
)

var httpBody string = `
	<ul class="good">
	    <li>1</li>
	    <li>2</li>
	    <li>3</li>
	</ul>

	<ul class="bad">
	    <li>a</li>
	    <li>b</li>
	    <li>c</li>
	</ul>
`

func main() {
	b := bytes.NewBufferString(httpBody)
	doc, err := goquery.NewDocumentFromReader(b)
	if err != nil {
		log.Fatal(err)
	}

	doc.Find("ul.good").Each(func(i int, ul *goquery.Selection) {
		ul.Find("li").Each(func(i int, li *goquery.Selection) {
			fmt.Println(li.Text())
		})
	})
}

这将打印出：

1
2
3

英文:

I have, in the past, used this package: https://github.com/PuerkitoBio/goquery

It provides a "jQuery-like" interface/querying across HTML documents. With that library, its as simple as this:

import (
	&quot;bytes&quot;
	&quot;fmt&quot;
	&quot;log&quot;

	&quot;github.com/PuerkitoBio/goquery&quot;
)

var httpBody string = `
	&lt;ul class =&quot;good&quot;&gt;
	    &lt;li&gt;1&lt;/li&gt;
	    &lt;li&gt;2&lt;/li&gt;
	    &lt;li&gt;3&lt;/li&gt;
	&lt;/ul&gt;

	&lt;ul class =&quot;bad&quot;&gt;
	    &lt;li&gt;a&lt;/li&gt;
	    &lt;li&gt;b&lt;/li&gt;
	    &lt;li&gt;c&lt;/li&gt;
	&lt;/ul&gt;
`

func main() {
	b := bytes.NewBufferString(httpBody)
	doc, err := goquery.NewDocumentFromReader(b)
	if err != nil {
		log.Fatal(err)
	}

	doc.Find(&quot;ul.good&quot;).Each(func(i int, ul *goquery.Selection) {
		ul.Find(&quot;li&quot;).Each(func(i int, li *goquery.Selection) {
			fmt.Println(li.Text())
		})
	})
}

Which prints:

1
2
3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

HTML – find all the sub-tags in a given tag

问题

答案1

如何在Golang中找到来自控制台的标志数量

TailwindCSS – 如何使元素的下拉菜单不将其他元素推出侧边栏的边界？

Go中的依赖类型的通用类型推断

Golang服务器：发送具有可变列数的SQL查询结果的JSON

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论