2023年2月18日 03:55:56go评论119阅读模式

英文:

Playwright works in headful mode but fails in headless

问题

我试图使用这个示例来获取NFT在opensea上的报价数量：

import { test, expect } from '@playwright/test';
test('test', async ({ page }) => {
    await page.goto('https://opensea.io/assets/ethereum/0x63217dbb73e7a02c1d30f486e899ee66d0aa5e0b/6341');
    await page.waitForLoadState('networkidle');
    let selector = page.locator("[id='Body offers-panel'] li");
    const offers = await selector.count();
    console.log('Num of offers:', offers);
});

然后我运行 "npx playwright tests"，它总是打印 "Num of offers: 0"。

但是，如果我在 --headed 模式下运行它，它完美地运行并输出 "Num of offers: 5"。

有人能解释/帮助我理解吗？

我尝试过使用：

let selector = page.locator("[id='Body offers-panel'] li").waitFor();

尝试等待直到所有请求完成：

await page.waitForLoadState('networkidle');

尝试等待选择器：

let selector = page.locator("[id='Body offers-panel'] li").first().waitFor();

但都没有起作用，除非我在 --headed 模式下运行测试，无论我尝试哪个NFT地址。

我想解决这个问题或理解为什么会发生这种情况。

英文:

im trying this sample to obtain the number of offers a NFT has in opensea:

import { test, expect } from &#39;@playwright/test&#39;;
test(&#39;test&#39;, async ({ page }) =&gt; {
    await page.goto(&#39;https://opensea.io/assets/ethereum/0x63217dbb73e7a02c1d30f486e899ee66d0aa5e0b/6341&#39;);
    await page.waitForLoadState(&#39;networkidle&#39;);
    let selector = page.locator(&quot;[id=&#39;Body offers-panel&#39;] li&quot;);
    const offers = await selector.count();
    console.log(&#39;Num of offers:&#39;, offers);
});

and then I run "npx playwright tests" what always print "Num of offers: 0"

But if I run it in --headed mode, it works perfectly and outputs "Num of offers: 5"

Can anyone explain/help me to understand it?

I tried using:

let selector = page.locator(&quot;[id=&#39;Body offers-panel&#39;] li&quot;).waitFor();

Tried to wait until all requests are done

await page.waitForLoadState(&#39;networkidle&#39;);

tried to wait for the selector:

let selector = page.locator(&quot;[id=&#39;Body offers-panel&#39;] li&quot;).first().waitFor();

But none worked, I always have 0 count unless I run the test in --headed mode, no matter of which NFT address I try.

I would like to solve it or understand why this happen

答案1

得分: 2

一些网站如果检测到无界面客户端，可能会阻止加载页面。这是为了防止数据抓取等行为。我猜这可能是发生的情况。

请参见：
你是无界面吗？
检测无界面

英文:

Some websites will not load the page if they detect a headless client. This is to prevent scraping and such. My guess is this is what's happening here

See: <br/>
Are you headless? <br/>
Detect Headless

答案2

得分: 1

Headless 模式会使服务器更容易识别你的脚本是一个机器人。当以无界面方式运行时，你会被检测到并被屏蔽，但在有界面方式下可以绕过检测。

由于你看不到任何内容，因此在无界面模式下进行调试比有界面模式更加困难。使用 console.log(await page.content()) 和 await page.screenshot({path: "test.png"}) 是找出为什么你期望在页面上出现的元素没有出现的好策略。

在这种情况下，在 goto 后添加以下内容来获取页面的全部文本内容：

const text = (await page.textContent("body"))
  .replace(/ +/g, " ")
  .replace(/(\n ?)+/g, "\n")
  .trim();
console.log(text);

输出将是：

拒绝访问
错误代码 1020
您无法访问 <Your URL>。站点所有者可能设置了限制，防止您访问该站点。
错误详情
请向站点所有者提供此信息。
我在访问 <Your URL> 时发生了错误。
错误代码：1020
Ray ID: **************
国家：US
数据中心：*****
IP：*****************
时间戳：2023-02-17 22:39:13 UTC
点击以复制
此页面有用吗？
是
否
感谢您的反馈！
由 Cloudflare 提供性能和安全性支持

这并不是完美的保证，但添加用户代理头部是一个简单的选项，似乎足够避免在当前时间点在该特定站点上被检测为无界面模式：

import {expect, test} from "@playwright/test"; // ^1.30.0
const url = "<Your URL>";
const userAgent =
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
test.describe("with user agent", () => {
  test.use({userAgent});
  test("is able to retrieve offers", async ({page}) => {
    await page.goto(url);
    const selector = page.locator('[id="Body offers-panel"] li');
    const offers = await selector.count();
    console.log("Num of offers:", offers); // => Num of offers: 11
  });
});

英文:

Headless mode makes it more obvious to servers that your script is a bot. You're being detected and blocked headlessly, but bypassing detection when running headfully.

Since you can't see anything, headless is a bit harder to debug than headful. Using console.log(await page.content()) and await page.screenshot({path: "test.png"}) are good strategies for figuring out why elements you expect to be on the page aren't.

In this case, adding

const text = (await page.textContent(&quot;body&quot;))
  .replace(/ +/g, &quot; &quot;)
  .replace(/(\n ?)+/g, &quot;\n&quot;)
  .trim();
console.log(text);

after goto to get the full text content of the page gives:

Access denied
Error code 1020
You do not have access to &lt;Your URL&gt;.The site owner may have set restrictions that prevent you from accessing the site.
Error details
Provide the site owner this information.
I got an error when visiting &lt;Your URL&gt;.
Error code: 1020
Ray ID: **************
Country: US
Data center: *****
IP: *****************
Timestamp: 2023-02-17 22:39:13 UTC
Click to copy
Was this page helpful?
Yes
No
Thank you for your feedback!
Performance &amp; security by Cloudflare

It's not a perfect guarantee, but adding a user agent header is an easy option that seems to be enough to avoid headless detection on this particular site at this point in time:

import {expect, test} from &quot;@playwright/test&quot;; // ^1.30.0
const url = &quot;&lt;Your URL&gt;&quot;;
const userAgent =
  &quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36&quot;;
test.describe(&quot;with user agent&quot;, () =&gt; {
  test.use({userAgent});
  test(&quot;is able to retrieve offers&quot;, async ({page}) =&gt; {
    await page.goto(url);
    const selector = page.locator(&#39;[id=&quot;Body offers-panel&quot;] li&#39;);
    const offers = await selector.count();
    console.log(&quot;Num of offers:&quot;, offers); // =&gt; Num of offers: 11
  });
});

答案3

得分: 1

你好，我正在使用Java与Playwright，遇到了类似的问题 - 除了无头检测之外，行为差异可能还有其他来源吗？我相当确定该网站没有实施无头检测。

英文:

Hello I am using Playwright with Java and have similar problems - is it possible that there are other sources of differences in behavior apart from headless detection? I am pretty sure the site does not have headless detection implemented.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Playwright在有头模式下工作，但在无头模式下失败。

问题

答案1

答案2

答案3

使用Selenium和XPath在Python中进行网页抓取时筛选掉模糊图片

告诉 TypeScript 我的类型已更改/缩小的方法是什么？

如何使具有联合类型作为键的记录不对所有值强制执行？

如何从Angular中的对象中获取键值对。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。