英文:
Unexpected HTML token from html.NewTokenizer.Token()
问题
我正在尝试列出网页中找到的所有标记。核心部分在以下函数中:
func find_links(httpBody io.Reader) []string {
links := make([]string, 0)
page := html.NewTokenizer(httpBody)
for {
tokenType := page.Next()
if tokenType == html.ErrorToken {
return links
}
token := page.Token()
fmt.Println("Now token is ", token)
}
}
当我打印输出时,我得到类似以下的结果:
Now token is <body>
Now token is
Now token is <header>
我不明白第二个标记是什么,为什么会打印出额外的空行。
完整的可运行示例代码在这里,尽管由于缺少http包,它无法在playground上运行。
英文:
I am trying to list all the tokens found in a web page. The core is in the function
func find_links(httpBody io.Reader) []string {
links := make([]string, 0)
page := html.NewTokenizer(httpBody)
for {
tokenType := page.Next()
if tokenType == html.ErrorToken {
return links
}
token := page.Token()
fmt.Println("Now token is ", token)
}
}
When I print the output I obtain something like
Now token is <body>
Now token is
Now token is <header>
I don't understand what the second token is and why it is printing an extra blank line.
Full code of a working example here, even if it can't run on playground because of the missing http package
答案1
得分: 1
第二个标记是一个包含换行符的TextToken。
将打印语句更改为
fmt.Printf("现在的标记是 %T %v\n", token, token)
以查看标记的类型。
英文:
The second token is a TextToken containing a newline.
Change the print to
fmt.Printf("Now token is %T %v\n", token, token)
to see the types of the tokens.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论