2023年2月10日 15:48:08go评论114阅读模式

英文:

Extract page content where there is a delay in the actual content using Playwright

问题

以下是您要翻译的代码部分：

import {chromium} from 'playwright'; // Web scraper Library
import * as fs from 'fs';
(async function () {
    const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
    const context = await chromeBrowser.newContext({
        ignoreHTTPSErrors: true,
        userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
    });
    const page = await context.newPage();
    await page.goto("https://www.imaginegolf.com/privacy", { waitUntil: 'networkidle', timeout: 60000 });
    let content = await page.content();
    fs.writeFileSync('test.html', content);
    console.log("done")
})();

英文:

I am trying to capture the privacy notice of this page - "https://www.imaginegolf.com/privacy". However, if you look at the page - it takes a while to load the privacy notice. Is there a way to make playwright wait and grab the contents of the page? I tried options like load, networkidle, commit and domcontentloaded

Sample source code

import {chromium}  from &#39;playwright&#39;; // Web scraper Library
import * as fs from &#39;fs&#39;;
(async function () {
    const chromeBrowser = await chromium.launch({ headless: true }); // Chromium launch and options
    const context = await chromeBrowser.newContext({ ignoreHTTPSErrors: true ,
        userAgent: &#39;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&#39;,
      });
    const page = await context.newPage();
    await page.goto(&quot;https://www.imaginegolf.com/privacy&quot;, { waitUntil: &#39;networkidle&#39;, timeout: 60000 });
    let content = await page.content();
    fs.writeFileSync(&#39;test.html&#39;, content);
    console.log(&quot;done&quot;)
})();

答案1

得分: 1

你可以使用 expect 添加一个检查，并指定超时时间来验证隐私声明是否可见。可能像这样：

await page.goto("https://www.imaginegolf.com/privacy");
await expect(page.locator('text="PRIVACY NOTICE"').toBeVisible({ timeout: 5000 });
let content = await page.content();
fs.writeFileSync('test.html', content);

根据你的需求调整定位器和超时时间，然后在 expect 成功后继续你的工作流程。

英文:

You can add a check with expect and specify a timeout to verify that the privacy note is visible. So something like this maybe:

  await page.goto(&quot;https://www.imaginegolf.com/privacy&quot;);
  await expect(page.locator(&#39;text=&quot;PRIVACY NOTICE&quot;).toBeVisible({ timeout:5000 });
  let content = await page.content();
  fs.writeFileSync(&#39;test.html&#39;, content);

Just adapt the locator and timeout to your needs and continue with your workflow after the expect succeds.

答案2

得分: 1

最好的方法可能是等待包含或是内容的一部分的元素，基本上是指示你期望的内容已加载的东西。由于你只是使用 Playwright 库（而不是测试），你不能使用 expect 来异步等待它在一段时间内可见，但无论如何，你并不打算进行断言，只是等待它然后继续执行。因此，我建议使用 waitFor 方法，像这样：

await page.getByText('隐私声明').waitFor()

或者使用任何最合理的定位器。注意，waitFor 默认等待直到它可见，因此你可以不传递任何参数，尽管如果你愿意，你始终可以显式指定。

英文:

The best way is to probably wait for an element that contains or is part of the content, basically something that indicates the content you expect is loaded. Since you’re just using Playwright Library (not Test), you can’t use expect to asynchronously expect it to be visible within a time frame, but you’re not trying to assert it anyways, just wait for it before moving on. So I would recommend using the waitFor method, like so:

await page.getByText(‘Privacy Notice’).waitFor()

Or with whatever locator best makes sense. Note that waitFor defaults to waiting until it is visible, hence being able to not pass any arguments, though you can always be explicit if you want.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Playwright提取实际内容中存在延迟的页面内容。

问题

答案1

答案2

email 输入字段在 HTML 表单中无法识别值。

如何更改选择选项文本

有没有更好的方法让这个函数如预期般返回true？

找不到数组反转算法问题的解决方案。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。