英文:
How to scrape what is heading of a h1 tag using golang?
问题
假设这是一个 h1 标签:
<h1>FindMe</h1>
在一个包含许多其他 h1 标签的大型网页中,但这是第一个 h1 标签。所以我正在使用 net/html 包,并且正在搜索第一个 StartTagToken,在我的程序找到该标记后,我如何获取标题内部的内容,即在这种情况下的 FindMe。
这是我目前的代码:
z := html.NewTokenizer(body)
for {
tt := z.Next()
if tt == html.ErrorToken {
return
} else if tt == html.StartTagToken {
tag := z.Token()
if tag.Data == "h1" {
fmt.Println("We found the title\n")
// 一些代码来查找标题中的内容
}
}
}
我该如何做到这一点?
编辑:更具体地说,变量 tag 的哪个属性会给我提供其中的文本。我对概念术语可能有误解,请谅解。
英文:
Suppose this is a h1 tag
<h1>FindMe</h1>
in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.
This is the code I have right now
z := html.NewTokenizer(body)
for{
tt := z.Next()
if tt= html.ErrorToken{
return
}
else if tt== html.StartTagToken{
tag := z.Token()
if tag.Data =="h1"{
fmt.Println("We found the title\n")
//some code to find what is stored in the heading
}
}
}
How do I go about doing that?
EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me
答案1
得分: 1
你得到的是StartTagToken
,你感兴趣的部分是它和相应的EndTagToken
之间的TextToken
。所以你需要读取下一个标记,它的Data
应该是你想要的值,类似于:
...
if tag.Data == "h1" {
if tt = z.Next(); tt == html.TextToken {
fmt.Println(z.Token().Data)
}
}
英文:
What you got is the StartTagToken
, the part you're intrested in is between it and the corresponding EndTagToken
as TextToken
. So you need to read the next token and it's Data
should be the value you're after, something like
...
if tag.Data =="h1"{
if tt = z.Next(); tt == html.TextToken {
fmt.Println(z.Token().Data)
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论