你可以使用Html Agility Pack来读取网页内容。

huangapple go评论61阅读模式
英文:

How can I read a web page with Html Agility Pack

问题

我想通过这个库来读取一个网页,但是有些页面有一个脚本,首先运行脚本(例如欢迎页面或者加载页面等),然后在第二级显示页面内容。而在这个页面中,这个库只能访问到第一级。

string linkUrl = "yoursite.come";
var doc = new HtmlWeb().Load(linkUrl);
var pTags = doc.DocumentNode.Descendants("p").Select(el => el.InnerText)
           .Where(u => !String.IsNullOrEmpty(u.ToString()));
// 有关pTags的任何代码
英文:

I want to read a web page by this library but some page has a script that in first level run script ( for example Welcome or site is loading or … ) then in second level show page content.
And in this page this library I access just the first level.

string linkUrl = "yoursite.come";
  var doc = new HtmlWeb().Load(linkUrl);
   var pTags = doc.DocumentNode.Descendants("p").Select(el => el.InnerText)
           .Where(u => !String.IsNullOrEmpty(u.ToString()));
// Any Code about pTags

答案1

得分: 2

  1. AngleSharp 是一个快速、可扩展且有良好文档支持的HTML解析器,支持JavaScript。它也非常容忍格式不规范的HTML。
  2. CsQuery 是一个轻量级的HTML解析器,使用XPath查询从HTML文档中选择元素。它也支持JavaScript。
  3. dotless 是一个纯.NET的HTML解析器,旨在快速且易于使用。它也支持JavaScript。

这些库可以帮助您解决您的问题。

英文:

As Html Agility Pack don't support JavaScript so you need to use alternative libraries that have JavaScript support.

  1. AngleSharp is a fast, extensible, and well-documented HTML parser that supports JavaScript. It is also very tolerant of malformed HTML.
  2. CsQuery is a lightweight HTML parser that uses XPath queries to select elements from an HTML document. It also supports JavaScript.
  3. dotless is a pure-.NET HTML parser that is designed to be fast and easy to use. It also supports JavaScript.

these libraries can help you solve your issue.

huangapple
  • 本文由 发表于 2023年6月26日 16:39:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554968.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定