Jsoup 在页面的某些部分未完全加载时仍返回文档。

huangapple go评论59阅读模式
英文:

Jsoup returns document even when some portion of page is not fully loaded

问题

doc = Jsoup.connect("https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/?").timeout(10000).userAgent("Mozilla").get();

这将在页面https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/? 仍在加载部分内容时返回成功。

  • 启用慢3G网络,您可以看到“Valuation”和“Growth & Efficiency”表格的加载符号。

请说明如何使Jsoup等待整个页面加载完毕,而不是部分数据被提取?

英文:
 doc = Jsoup.connect("https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/?").timeout(10000).userAgent("Mozilla").get();

This return success even when some portion of the page https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/? is still loading

  • Enable slow 3G network on and you can see spinner for tables Valuation, Growth & Efficiency

Please clarify how to make Jsoup to wait for whole of the page is loaded and not partial data is fetched ?

答案1

得分: 1

以下是翻译好的部分:

"你的 jsoup 代码没有问题。您正在查找的数据是通过 xhr 请求获取的,所以 jsoup 不会加载它。
数据可以在以下网址找到 - https://www.valueresearchonline.com/stocks/overview/42508 以 json 文件的形式,您可以下载并处理。
带有注释的示例代码:

String url = "https://www.valueresearchonline.com/stocks/overview/42508";
// 必须添加 ignoreContentType,否则 jsoup 不会获取 json
Document doc = jsoup.connect(url).ignoreContentType(true).get();
// 将文本转换为 json 对象
JSONObject json = new JSONObject(doc.text());
// 获取保存数据的两个数组
JSONArray valuation = json.getJSONArray("valuation_overview_table_data");
JSONArray growth = json.getJSONArray("growth_overview_table_data");
System.out.println(valuation);
System.println(growth);

为了找到数据的网址,我不得不在页面中搜索 js 文件,直到我在文件 'script-v2__slash__stocks__slash__42621__slash__.js' 中找到它。"

英文:

There is nothing wrong with your jsoup code. The data that you are looking for is being fetched with an xhr request, so jsoup won't load it.
The data can be found in this url - https://www.valueresearchonline.com/stocks/overview/42508 as a json file, which you can download and process.
Example code with explantaions in the comments:

String url = "https://www.valueresearchonline.com/stocks/overview/42508";
//Must add ignoreContentType, otherwise jsoup will not fetch json
Document doc = jsoup.connect(url).ignoreContentType(true).get();
//Convert the text to json onject
JSONObject json = new JSONObject(doc.text());
//Get the two arrays that hold your data
JSONArray valuation = json.getJSONArray("valuation_overview_table_data");
JSONArray growth = json.getJSONArray("growth_overview_table_data");
System.out.println(valuation);
System.out.println(growth);

In order to find the url of the data I had to search through the js files in the page, until I found it in the file script-v2__slash__stocks__slash__42621__slash__.js.

huangapple
  • 本文由 发表于 2023年2月24日 14:09:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553133.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定