问题

我正在尝试列出网页中找到的所有标记。核心部分在以下函数中：

func find_links(httpBody io.Reader) []string {
	links := make([]string, 0)
	page := html.NewTokenizer(httpBody)
	for {
		tokenType := page.Next()
		if tokenType == html.ErrorToken {
			return links
		}
		token := page.Token()
		fmt.Println("Now token is ", token)
	}
}

当我打印输出时，我得到类似以下的结果：

Now token is <body>
Now token is

Now token is <header>

我不明白第二个标记是什么，为什么会打印出额外的空行。

完整的可运行示例代码在这里，尽管由于缺少http包，它无法在playground上运行。

英文:

I am trying to list all the tokens found in a web page. The core is in the function

func find_links(httpBody io.Reader) []string {

	links := make([]string, 0)
	page := html.NewTokenizer(httpBody)
	for {
		tokenType := page.Next()
		if tokenType == html.ErrorToken {
			return links
		}
		token := page.Token()
		fmt.Println(&quot;Now token is &quot;, token)
	}
}

When I print the output I obtain something like

Now token is  &lt;body&gt;
Now token is

Now token is  &lt;header&gt;

I don't understand what the second token is and why it is printing an extra blank line.

Full code of a working example here, even if it can't run on playground because of the missing http package

答案1

得分: 1

第二个标记是一个包含换行符的TextToken。

将打印语句更改为

   fmt.Printf("现在的标记是 %T %v\n", token, token)

以查看标记的类型。

英文:

The second token is a TextToken containing a newline.

Change the print to

   fmt.Printf(&quot;Now token is %T %v\n&quot;, token, token)

to see the types of the tokens.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

意外的HTML标记来自html.NewTokenizer.Token()。

问题

答案1

如何干净地向函数添加一个参数

How to improve my website in loading the data/ sorting/ jquery? And Best Practices for web development

使用net/url Values{}时出现错误。

为什么内存块不会被垃圾回收器清理？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论