使用Playwright提取实际内容中存在延迟的页面内容。

huangapple go评论72阅读模式
英文:

Extract page content where there is a delay in the actual content using Playwright

问题

以下是您要翻译的代码部分:

import {chromium} from 'playwright'; // Web scraper Library
import * as fs from 'fs';

(async function () {
    const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
    const context = await chromeBrowser.newContext({
        ignoreHTTPSErrors: true,
        userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
    });
    const page = await context.newPage();
    await page.goto("https://www.imaginegolf.com/privacy", { waitUntil: 'networkidle', timeout: 60000 });
    let content = await page.content();
    fs.writeFileSync('test.html', content);
    console.log("done")
})();
英文:

I am trying to capture the privacy notice of this page - "https://www.imaginegolf.com/privacy". However, if you look at the page - it takes a while to load the privacy notice. Is there a way to make playwright wait and grab the contents of the page? I tried options like load, networkidle, commit and domcontentloaded

Sample source code

import {chromium}  from 'playwright'; // Web scraper Library
import * as fs from 'fs';

(async function () {
    const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
    const context = await chromeBrowser.newContext({ ignoreHTTPSErrors: true ,
        userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
      });
    const page = await context.newPage();
    await page.goto("https://www.imaginegolf.com/privacy", { waitUntil: 'networkidle', timeout: 60000 });
    let content = await page.content();
    fs.writeFileSync('test.html', content);
    console.log("done")
})();

答案1

得分: 1

你可以使用 expect 添加一个检查,并指定超时时间来验证隐私声明是否可见。可能像这样:

await page.goto("https://www.imaginegolf.com/privacy");
await expect(page.locator('text="PRIVACY NOTICE"').toBeVisible({ timeout: 5000 });
let content = await page.content();
fs.writeFileSync('test.html', content);

根据你的需求调整定位器和超时时间,然后在 expect 成功后继续你的工作流程。
英文:

You can add a check with expect and specify a timeout to verify that the privacy note is visible. So something like this maybe:

  await page.goto("https://www.imaginegolf.com/privacy");
  await expect(page.locator('text="PRIVACY NOTICE").toBeVisible({ timeout:5000 });
  let content = await page.content();
  fs.writeFileSync('test.html', content);

Just adapt the locator and timeout to your needs and continue with your workflow after the expect succeds.

答案2

得分: 1

最好的方法可能是等待包含或是内容的一部分的元素,基本上是指示你期望的内容已加载的东西。由于你只是使用 Playwright 库(而不是测试),你不能使用 expect 来异步等待它在一段时间内可见,但无论如何,你并不打算进行断言,只是等待它然后继续执行。因此,我建议使用 waitFor 方法,像这样:

await page.getByText('隐私声明').waitFor()

或者使用任何最合理的定位器。注意,waitFor 默认等待直到它可见,因此你可以不传递任何参数,尽管如果你愿意,你始终可以显式指定。

英文:

The best way is to probably wait for an element that contains or is part of the content, basically something that indicates the content you expect is loaded. Since you’re just using Playwright Library (not Test), you can’t use expect to asynchronously expect it to be visible within a time frame, but you’re not trying to assert it anyways, just wait for it before moving on. So I would recommend using the waitFor method, like so:

await page.getByText(Privacy Notice).waitFor()

Or with whatever locator best makes sense. Note that waitFor defaults to waiting until it is visible, hence being able to not pass any arguments, though you can always be explicit if you want.

huangapple
  • 本文由 发表于 2023年2月10日 15:48:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75408228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定