如何使用Golang爬取h1标签的标题?

huangapple go评论80阅读模式
英文:

How to scrape what is heading of a h1 tag using golang?

问题

假设这是一个 h1 标签:

<h1>FindMe</h1>

在一个包含许多其他 h1 标签的大型网页中,但这是第一个 h1 标签。所以我正在使用 net/html 包,并且正在搜索第一个 StartTagToken,在我的程序找到该标记后,我如何获取标题内部的内容,即在这种情况下的 FindMe。

这是我目前的代码:

z := html.NewTokenizer(body)	

for {
	tt := z.Next()

	if tt == html.ErrorToken {
		return
	} else if tt == html.StartTagToken {
		tag := z.Token()

		if tag.Data == "h1" {
			fmt.Println("We found the title\n")
            // 一些代码来查找标题中的内容
		}
	}
}

我该如何做到这一点?

编辑:更具体地说,变量 tag 的哪个属性会给我提供其中的文本。我对概念术语可能有误解,请谅解。

英文:

Suppose this is a h1 tag

&lt;h1&gt;FindMe&lt;/h1&gt;

in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.

This is the code I have right now

z := html.NewTokenizer(body)	

for{
	tt := z.Next()

	if tt= html.ErrorToken{
		return
	}
	else if tt== html.StartTagToken{
		tag := z.Token()

		if tag.Data ==&quot;h1&quot;{
			fmt.Println(&quot;We found the title\n&quot;)
            //some code to find what is stored in the heading
		}
	}
} 

How do I go about doing that?

EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me

答案1

得分: 1

你得到的是StartTagToken,你感兴趣的部分是它和相应的EndTagToken之间的TextToken。所以你需要读取下一个标记,它的Data应该是你想要的值,类似于:

...
if tag.Data == "h1" {
   if tt = z.Next(); tt == html.TextToken {
       fmt.Println(z.Token().Data)
   }
}
英文:

What you got is the StartTagToken, the part you're intrested in is between it and the corresponding EndTagToken as TextToken. So you need to read the next token and it's Data should be the value you're after, something like

...
if tag.Data ==&quot;h1&quot;{
   if tt = z.Next(); tt == html.TextToken {
       fmt.Println(z.Token().Data)
   }
}

huangapple
  • 本文由 发表于 2017年1月6日 00:49:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/41490400.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定