英文:
How can I read a web page with Html Agility Pack
问题
我想通过这个库来读取一个网页,但是有些页面有一个脚本,首先运行脚本(例如欢迎页面或者加载页面等),然后在第二级显示页面内容。而在这个页面中,这个库只能访问到第一级。
string linkUrl = "yoursite.come";
var doc = new HtmlWeb().Load(linkUrl);
var pTags = doc.DocumentNode.Descendants("p").Select(el => el.InnerText)
.Where(u => !String.IsNullOrEmpty(u.ToString()));
// 有关pTags的任何代码
英文:
I want to read a web page by this library but some page has a script that in first level run script ( for example Welcome or site is loading or … ) then in second level show page content.
And in this page this library I access just the first level.
string linkUrl = "yoursite.come";
var doc = new HtmlWeb().Load(linkUrl);
var pTags = doc.DocumentNode.Descendants("p").Select(el => el.InnerText)
.Where(u => !String.IsNullOrEmpty(u.ToString()));
// Any Code about pTags
答案1
得分: 2
- AngleSharp 是一个快速、可扩展且有良好文档支持的HTML解析器,支持JavaScript。它也非常容忍格式不规范的HTML。
- CsQuery 是一个轻量级的HTML解析器,使用XPath查询从HTML文档中选择元素。它也支持JavaScript。
- dotless 是一个纯.NET的HTML解析器,旨在快速且易于使用。它也支持JavaScript。
这些库可以帮助您解决您的问题。
英文:
As Html Agility Pack don't support JavaScript so you need to use alternative libraries that have JavaScript support.
- AngleSharp is a fast, extensible, and well-documented HTML parser that supports JavaScript. It is also very tolerant of malformed HTML.
- CsQuery is a lightweight HTML parser that uses XPath queries to select elements from an HTML document. It also supports JavaScript.
- dotless is a pure-.NET HTML parser that is designed to be fast and easy to use. It also supports JavaScript.
these libraries can help you solve your issue.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论