2023年5月11日 16:34:37go评论154阅读模式

英文:

Playwright is unable to launch chrome on Alpine Docker

问题

以下是您提供的内容的中文翻译：

在使用Docker中的node:lts-alpine运行Node.js应用程序时，我遇到了以下错误。

INFO  PlaywrightCrawler: Starting the crawl
WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. browserType.launchPersistentContext: Failed to launch: Error: spawn /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome ENOENT
=========================== logs ===========================
&lt;launching&gt; /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --proxy-server=http://127.0.0.1:43519 --proxy-bypass-list=&lt;-loopback&gt; --disable-blink-features=AutomationControlled --user-data-dir=/tmp/playwright_chromiumdev_profile-oKPLgl --remote-debugging-pipe about:blank
[pid=N/A] starting temporary directories cleanup
[pid=N/A] finished temporary directories cleanup
============================================================
 {&quot;id&quot;:&quot;Gl0EwOcnElHCOkr&quot;,&quot;url&quot;:&quot;https://labs.withgoogle.com/&quot;,&quot;retryCount&quot;:1}

这是我的代码：

const scrapeWebsiteUsingApify = async (source) => {
  const { Actor } = Apify;
  const { PlaywrightCrawler } = Crawlee;
  try {
    const sourceKey = source.replace(/[^a-zA-Z0-9]/g, '');
    await Actor.init();
    const store = await Actor.openKeyValueStore();
    // 检查给定源URL的数据是否存在于存储中
    const record = await store.getValue(sourceKey);
    if (record) {
      // 如果数据存在于存储中，则直接返回它
      return record;
    }
    // 如果数据不存在于存储中，则爬取网站
    let content;
    const crawler = new PlaywrightCrawler({
      async requestHandler({ page }) {
        await page.waitForTimeout(3000);
        // 作为结果返回数据
        content = await page.content();
      },
    });
    const crawledInfo = await crawler.run([source]);
    // 将爬取的数据存储在键值存储中以备将来使用
    await store.setValue(sourceKey, { ...crawledInfo, content });
    return { ...crawledInfo, content };
  } catch (e) {
    return null;
  }
};

该代码在我的M1 MacBook Pro上运行正常，但在部署时无法启动Chrome。我验证了以下位置：

/root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome

它具有以下权限的chrome文件，

-rwxr-xr-x 1 root root 372244488 May 10 13:45 chrome

从错误中可以很清楚地看出它无法定位文件，因此我也检查了$PATH并尝试将chrome的路径添加到其中，但即使这样也似乎不起作用。

您能帮助我理解这个错误并提供可能的修复方法吗？

英文:

I am getting the following error when running scrapper (nodejs application) on node:lts-alpine in docker.

INFO  PlaywrightCrawler: Starting the crawl
WARN  PlaywrightCrawler: Reclaiming failed request back to the list or queue. browserType.launchPersistentContext: Failed to launch: Error: spawn /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome ENOENT
=========================== logs ===========================
&lt;launching&gt; /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome --disable-field-trial-config --disable-background-networking --enable-features=NetworkService,NetworkServiceInProcess --disable-background-timer-throttling --disable-backgrounding-occluded-windows --disable-back-forward-cache --disable-breakpad --disable-client-side-phishing-detection --disable-component-extensions-with-background-pages --disable-component-update --no-default-browser-check --disable-default-apps --disable-dev-shm-usage --disable-extensions --disable-features=ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate --allow-pre-commit-input --disable-hang-monitor --disable-ipc-flooding-protection --disable-popup-blocking --disable-prompt-on-repost --disable-renderer-backgrounding --force-color-profile=srgb --metrics-recording-only --no-first-run --enable-automation --password-store=basic --use-mock-keychain --no-service-autorun --export-tagged-pdf --headless --hide-scrollbars --mute-audio --blink-settings=primaryHoverType=2,availableHoverTypes=2,primaryPointerType=4,availablePointerTypes=4 --no-sandbox --proxy-server=http://127.0.0.1:43519 --proxy-bypass-list=&lt;-loopback&gt; --disable-blink-features=AutomationControlled --user-data-dir=/tmp/playwright_chromiumdev_profile-oKPLgl --remote-debugging-pipe about:blank
[pid=N/A] starting temporary directories cleanup
[pid=N/A] finished temporary directories cleanup
============================================================
 {&quot;id&quot;:&quot;Gl0EwOcnElHCOkr&quot;,&quot;url&quot;:&quot;https://labs.withgoogle.com/&quot;,&quot;retryCount&quot;:1}

Here is my code:

const scrapeWebsiteUsingApify = async (source) =&gt; {
  const { Actor } = Apify;
  const { PlaywrightCrawler } = Crawlee;
  try {
    const sourceKey = source.replace(/[^a-zA-Z0-9]/g, &#39;&#39;);
    await Actor.init();
    const store = await Actor.openKeyValueStore();
    // Check if data for the given source URL exists in the store
    const record = await store.getValue(sourceKey);
    if (record) {
      // If data exists in the store, return it directly
      return record;
    }
    // If data does not exist in the store, scrape the website
    let content;
    const crawler = new PlaywrightCrawler({
      async requestHandler({ page }) {
        await page.waitForTimeout(3000);
        // Return the data as a result
        content = await page.content();
      },
    });
    const crawledInfo = await crawler.run([source]);
    // Store the scraped data in the key-value store for future use
    await store.setValue(sourceKey, { ...crawledInfo, content });
    return { ...crawledInfo, content };
  } catch (e) {
    return null;
  }
};

The code works fine on my M1 macbook pro, but on deployment it fails to launch chrome. I verified the location

/root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome

It has the chrome file with the following permissions,

-rwxr-xr-x 1 root root 372244488 May 10 13:45 chrome

It is quite clear from the error that it's not able to locate the file so checked the $PATH as well and tried adding the path of the chrome to it, but even that doesn't seem to work.

Can you help me understand the error and a possible fix would be much appreciated.

答案1

得分: 2

Playwright在开箱即用时无法处理该图像。

相关Playwright问题：https://github.com/microsoft/playwright/issues/2826

您观察到的错误是由于缺少依赖项引起的。您可以通过登录到容器并运行ldd /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome来检查。

Playwright有一个名为'install-dependencies'的命令来解决此问题，但它使用的是apt-get，而在alpine中不可用（alpine使用apk）。

还有来自Playwright团队的docker镜像，以及可以通过在alpine镜像上专门安装chrome的alpine-chrome镜像。

英文:

Playwright doesn't work with that image out of the box.

Related Playwright issue: https://github.com/microsoft/playwright/issues/2826

The error you observe is because of missing dependencies. You can check that by logging in to container and running ldd /root/.cache/ms-playwright/chromium-1060/chrome-linux/chrome.

Playwright has 'install-dependencies' command to fix this issue, but it works with apt-get which is not available in alpine (alpine uses apk).

There are also docker images from playwright team as well as there is alpine-chrome image that works by installing chrome specifically over alpine image.

答案2

得分: 0

你应该使用来自 Apify 的专用图像 - https://github.com/apify/actor-templates/blob/master/templates/js-crawlee-playwright-chrome/.actor/Dockerfile

英文:

You should use the dedicated images from Apify - https://github.com/apify/actor-templates/blob/master/templates/js-crawlee-playwright-chrome/.actor/Dockerfile

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Playwright 无法在 Alpine Docker 上启动 Chrome。

问题

答案1

答案2

如何在新版Node.js中使用forEach函数来过滤MongoDB数据库。

如何将数据变量传递到Twilio短信正文中

点击表单中的按钮没有任何反应。

denied: requested access to the resource is denied when I try to push my Docker image to my DockerHub repository

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。