英文:
How to scrape what is heading of a h1 tag using golang?
问题
假设这是一个 h1 标签:
<h1>FindMe</h1>
在一个包含许多其他 h1 标签的大型网页中,但这是第一个 h1 标签。所以我正在使用 net/html 包,并且正在搜索第一个 StartTagToken,在我的程序找到该标记后,我如何获取标题内部的内容,即在这种情况下的 FindMe。
这是我目前的代码:
z := html.NewTokenizer(body)
for {
tt := z.Next()
if tt == html.ErrorToken {
return
} else if tt == html.StartTagToken {
tag := z.Token()
if tag.Data == "h1" {
fmt.Println("We found the title\n")
// 一些代码来查找标题中的内容
}
}
}
我该如何做到这一点?
编辑:更具体地说,变量 tag 的哪个属性会给我提供其中的文本。我对概念术语可能有误解,请谅解。
英文:
Suppose this is a h1 tag
<h1>FindMe</h1>
in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.
This is the code I have right now
z := html.NewTokenizer(body)
for{
tt := z.Next()
if tt= html.ErrorToken{
return
}
else if tt== html.StartTagToken{
tag := z.Token()
if tag.Data =="h1"{
fmt.Println("We found the title\n")
//some code to find what is stored in the heading
}
}
}
How do I go about doing that?
EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me
答案1
得分: 1
你得到的是StartTagToken,你感兴趣的部分是它和相应的EndTagToken之间的TextToken。所以你需要读取下一个标记,它的Data应该是你想要的值,类似于:
...
if tag.Data == "h1" {
if tt = z.Next(); tt == html.TextToken {
fmt.Println(z.Token().Data)
}
}
英文:
What you got is the StartTagToken, the part you're intrested in is between it and the corresponding EndTagToken as TextToken. So you need to read the next token and it's Data should be the value you're after, something like
...
if tag.Data =="h1"{
if tt = z.Next(); tt == html.TextToken {
fmt.Println(z.Token().Data)
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论