英文:
Extract page content where there is a delay in the actual content using Playwright
问题
以下是您要翻译的代码部分:
import {chromium} from 'playwright'; // Web scraper Library
import * as fs from 'fs';
(async function () {
const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
const context = await chromeBrowser.newContext({
ignoreHTTPSErrors: true,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
});
const page = await context.newPage();
await page.goto("https://www.imaginegolf.com/privacy", { waitUntil: 'networkidle', timeout: 60000 });
let content = await page.content();
fs.writeFileSync('test.html', content);
console.log("done")
})();
英文:
I am trying to capture the privacy notice of this page - "https://www.imaginegolf.com/privacy". However, if you look at the page - it takes a while to load the privacy notice. Is there a way to make playwright wait and grab the contents of the page? I tried options like load, networkidle, commit and domcontentloaded
Sample source code
import {chromium} from 'playwright'; // Web scraper Library
import * as fs from 'fs';
(async function () {
const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
const context = await chromeBrowser.newContext({ ignoreHTTPSErrors: true ,
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
});
const page = await context.newPage();
await page.goto("https://www.imaginegolf.com/privacy", { waitUntil: 'networkidle', timeout: 60000 });
let content = await page.content();
fs.writeFileSync('test.html', content);
console.log("done")
})();
答案1
得分: 1
你可以使用 expect 添加一个检查,并指定超时时间来验证隐私声明是否可见。可能像这样:
await page.goto("https://www.imaginegolf.com/privacy");
await expect(page.locator('text="PRIVACY NOTICE"').toBeVisible({ timeout: 5000 });
let content = await page.content();
fs.writeFileSync('test.html', content);
根据你的需求调整定位器和超时时间,然后在 expect 成功后继续你的工作流程。
英文:
You can add a check with expect and specify a timeout to verify that the privacy note is visible. So something like this maybe:
await page.goto("https://www.imaginegolf.com/privacy");
await expect(page.locator('text="PRIVACY NOTICE").toBeVisible({ timeout:5000 });
let content = await page.content();
fs.writeFileSync('test.html', content);
Just adapt the locator and timeout to your needs and continue with your workflow after the expect succeds.
答案2
得分: 1
最好的方法可能是等待包含或是内容的一部分的元素,基本上是指示你期望的内容已加载的东西。由于你只是使用 Playwright 库(而不是测试),你不能使用 expect
来异步等待它在一段时间内可见,但无论如何,你并不打算进行断言,只是等待它然后继续执行。因此,我建议使用 waitFor 方法,像这样:
await page.getByText('隐私声明').waitFor()
或者使用任何最合理的定位器。注意,waitFor 默认等待直到它可见,因此你可以不传递任何参数,尽管如果你愿意,你始终可以显式指定。
英文:
The best way is to probably wait for an element that contains or is part of the content, basically something that indicates the content you expect is loaded. Since you’re just using Playwright Library (not Test), you can’t use expect
to asynchronously expect it to be visible within a time frame, but you’re not trying to assert it anyways, just wait for it before moving on. So I would recommend using the waitFor method, like so:
await page.getByText(‘Privacy Notice’).waitFor()
Or with whatever locator best makes sense. Note that waitFor defaults to waiting until it is visible, hence being able to not pass any arguments, though you can always be explicit if you want.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论