获取JS加载后页面的HTML内容

huangapple go评论121阅读模式
英文:

Get the HTML content of a page after the JS loading

问题

我试图获取Rust文档中搜索的结果。我制作了以下代码来执行此操作:

let HTMLParser = require('node-html-parser');
let https = require('https');
const search = "foo";
let options = {
    host: "doc.rust-lang.org",
    path: "/std/index.html?search=" + search
};

let request = https.get(options, (res) => {
    if (res.statusCode != 200) return console.log(`发生错误:${res.statusCode}。请稍后重试。`);
    res.setEncoding("utf8");

    let output = "";

    res.on("data", (chunk) => {
        output += chunk
    });

    res.on("end", () => {
        let root = HTMLParser.parse(output);
        console.log(root.querySelector(".search-results")); // 打印 "null",因为搜索在请求响应到达时尚未完成
    });

    request.end();
});

但当我运行这段代码时,我得到了index.html页面的HTML内容,就好像我请求了这个页面而没有?search="foo"。我发现,当我们搜索某些内容时,页面会动态更改,然后基本内容被隐藏,搜索框变得可见。因此,似乎在我获取请求结果时,JS没有加载,但我需要它来获取文档中的搜索结果。我不知道该怎么办。谢谢您提前的答复!

英文:

I'm trying to get the results of a search in the Rust documentation. I made this code to do it :

let HTMLParser = require('node-html-parser');
let https = require('https');
const search = "foo";
let options = {
    host: "doc.rust-lang.org",
    path: "/std/index.html?search=" + search
};

let request = https.get(options, (res) => {
    if (res.statusCode != 200) return console.log(`An error occured : ${res.statusCode}. Retry later.`);
    res.setEncoding("utf8");

    let output = "";

    res.on("data", (chunk) => {
        output += chunk
    });

    res.on("end", () => {
        let root = HTMLParser.parse(output);
        console.log(root.querySelector(".search-results")); // print "null" because the search is not done when the request response come
    });

    request.end();
});

But when I run this code, I get the HTML content of the index.html page like if I requested this page without the ?search="foo". I found that the page change dynamically with some JS when we search for something, and then the base content is set to hidden and the search div become visible. So it seems that the JS didn't load when I get the request result, but I needs it to get the results of the search in the documentation. I don't know how I can do that.
Thank you in advance for your answers !

答案1

得分: 1

Rust文档页面在执行搜索时似乎不会与后端通信。我使用浏览器开发者工具注意到了这一点。

看起来页面加载了一个包含现成文档的search-index。您可以使用此JavaScript来搜索文档。逻辑写在main.js中。

如果您想要更多信息,请告诉我,因为我还没有找到如何创建每个文档项上的链接生成方式。

编辑

构建URL所需的所有逻辑都在main.js中。方法如下所示。如果您仔细查看aliases.js、main.js、storage.js和search-index.js文件,几乎可以重复使用其中的所有内容来创建链接和所需的搜索输出。

function buildHrefAndPath(item) {
      var displayPath;
      var href;
      var type = itemTypes[item.ty];
      var name = item.name;
      if (type === 'mod') {
        displayPath = item.path + '::';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + name + '/index.html'
      } else if (type === 'primitive' || type === 'keyword') {
        displayPath = '';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + type + '.' + name + '.html'
      } else if (type === 'externcrate') {
        displayPath = '';
        href = rootPath + name + '/index.html'
      } else if (item.parent !== undefined) {
        var myparent = item.parent;
        var anchor = '#' + type + '.' + name;
        var parentType = itemTypes[myparent.ty];
        if (parentType === 'primitive') {
          displayPath = myparent.name + '::'
        } else {
          displayPath = item.path + '::' + myparent.name + '::'
        }
        href = rootPath + item.path.replace(/::/g, '/') + '/' + parentType + '.' + myparent.name + '.html' + anchor
      } else {
        displayPath = item.path + '::';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + type + '.' + name + '.html'
      }
      return [displayPath,
      href]
    }
英文:

The Rust doc page does not seem to hit a backend when a search is performed. I noticed this using the browser developer tools.

It looks like the page loads a search-index which contains the readily available docs. You can use this js to search for docs. The logic is written in the main.js.

Let me know if you are looking for more info, as I have not found out how the link generation on each doc item is created.

EDIT

All the logic required to build the url is in main.js. The method is as follows. If you take a close look at the aliases.js, main.js, storage.js and search-index.js files, you can reuse almost all of it to create the links and the required search outputs.

function buildHrefAndPath(item) {
      var displayPath;
      var href;
      var type = itemTypes[item.ty];
      var name = item.name;
      if (type === 'mod') {
        displayPath = item.path + '::';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + name + '/index.html'
      } else if (type === 'primitive' || type === 'keyword') {
        displayPath = '';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + type + '.' + name + '.html'
      } else if (type === 'externcrate') {
        displayPath = '';
        href = rootPath + name + '/index.html'
      } else if (item.parent !== undefined) {
        var myparent = item.parent;
        var anchor = '#' + type + '.' + name;
        var parentType = itemTypes[myparent.ty];
        if (parentType === 'primitive') {
          displayPath = myparent.name + '::'
        } else {
          displayPath = item.path + '::' + myparent.name + '::'
        }
        href = rootPath + item.path.replace(/::/g, '/') + '/' + parentType + '.' + myparent.name + '.html' + anchor
      } else {
        displayPath = item.path + '::';
        href = rootPath + item.path.replace(/::/g, '/') + '/' + type + '.' + name + '.html'
      }
      return [displayPath,
      href]
    }

huangapple
  • 本文由 发表于 2020年1月7日 02:36:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/59617234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定