问题

使用gocolly库可以爬取CSR（客户端渲染/JS）网站吗？我需要爬取许多网站，为此，我在数据库中有一个titleXpath，如下所示：

c.OnXML(titleXpath, func(e *colly.XMLElement) {
   data = append(data, e.Text)
   fmt.Println("title", e.Text)
})

是、否或其他包。

英文:

Is it possible to crawl CSR(Client Side Render/JS) websites using gocolly? I need to crawl many websites, and for that, I have a titleXpath in the database as follows:

c.OnXML(titleXpath, func(e *colly.XMLElement) {
   data = append(data, e.Text)
   fmt.Println(&quot;title&quot;, e.Text)
})

Yes or no or another package

答案1

得分: 2

使用gocolly单独无法爬取客户端渲染（CSR/JS）的网站。gocolly是一个针对Golang的网络爬虫库，它在HTTP层面操作并解析静态HTML文档，但它无法执行JavaScript。

要爬取CSR网站，你需要一个无头浏览器或支持JavaScript渲染的网络爬虫工具。一些常用的用于爬取CSR网站的选项包括：

Puppeteer（与Golang库如chromedp一起使用）
Selenium（与Golang库如goselenium一起使用）

英文:

It is not possible to crawl Client-Side Rendered (CSR/JS) websites using gocolly alone. gocolly is a scraping library for Golang that operates at the HTTP level and can parse static HTML documents, but it does not execute JavaScript.

To scrape CSR websites, you need a headless browser or a web scraping tool that supports JavaScript rendering. Some popular options for scraping CSR websites include:

Puppeteer (with the Golang library such as chromedp)
Selenium (with the Golang library such as goselenium)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

可以使用gocolly爬取CSR网站吗？

问题

答案1

Golang，有没有更好的方法将一个整数文件读入数组中？

为什么这个 Go AWS Lambda 的 S3PutEvent 是空的？

Operator precedence in Go in formula V = 4/3πr3

特定错误处理的行为不明确

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论