英文:
Jsoup returns document even when some portion of page is not fully loaded
问题
doc = Jsoup.connect("https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/?").timeout(10000).userAgent("Mozilla").get();
这将在页面https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/? 仍在加载部分内容时返回成功。
- 启用慢3G网络,您可以看到“Valuation”和“Growth & Efficiency”表格的加载符号。
请说明如何使Jsoup等待整个页面加载完毕,而不是部分数据被提取?
英文:
doc = Jsoup.connect("https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/?").timeout(10000).userAgent("Mozilla").get();
This return success even when some portion of the page https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/? is still loading
- Enable slow 3G network on and you can see spinner for tables
Valuation
,Growth & Efficiency
Please clarify how to make Jsoup to wait for whole of the page is loaded and not partial data is fetched ?
答案1
得分: 1
以下是翻译好的部分:
"你的 jsoup 代码没有问题。您正在查找的数据是通过 xhr 请求获取的,所以 jsoup 不会加载它。
数据可以在以下网址找到 - https://www.valueresearchonline.com/stocks/overview/42508 以 json 文件的形式,您可以下载并处理。
带有注释的示例代码:
String url = "https://www.valueresearchonline.com/stocks/overview/42508";
// 必须添加 ignoreContentType,否则 jsoup 不会获取 json
Document doc = jsoup.connect(url).ignoreContentType(true).get();
// 将文本转换为 json 对象
JSONObject json = new JSONObject(doc.text());
// 获取保存数据的两个数组
JSONArray valuation = json.getJSONArray("valuation_overview_table_data");
JSONArray growth = json.getJSONArray("growth_overview_table_data");
System.out.println(valuation);
System.println(growth);
为了找到数据的网址,我不得不在页面中搜索 js 文件,直到我在文件 'script-v2__slash__stocks__slash__42621__slash__.js' 中找到它。"
英文:
There is nothing wrong with your jsoup code. The data that you are looking for is being fetched with an xhr request, so jsoup won't load it.
The data can be found in this url - https://www.valueresearchonline.com/stocks/overview/42508 as a json file, which you can download and process.
Example code with explantaions in the comments:
String url = "https://www.valueresearchonline.com/stocks/overview/42508";
//Must add ignoreContentType, otherwise jsoup will not fetch json
Document doc = jsoup.connect(url).ignoreContentType(true).get();
//Convert the text to json onject
JSONObject json = new JSONObject(doc.text());
//Get the two arrays that hold your data
JSONArray valuation = json.getJSONArray("valuation_overview_table_data");
JSONArray growth = json.getJSONArray("growth_overview_table_data");
System.out.println(valuation);
System.out.println(growth);
In order to find the url of the data I had to search through the js files in the page, until I found it in the file script-v2__slash__stocks__slash__42621__slash__.js
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论