Puppeteer无法捕获的目标关闭错误

huangapple go评论80阅读模式
英文:

Puppeteer Uncatchable Target Closed Error

问题

我在使用Puppeteer时遇到了问题。
每10到50个请求,我都会收到以下错误消息:

TargetCloseError: Protocol error (Network.getCookies): Target closed
at CallbackRegistry.clear (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:153:36)
at CDPSessionImpl._onClosed (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:468:70)
at Connection.onMessage (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:265:25)
at WebSocket.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/NodeWebSocketTransport.js:62:32)
at callListener (/app/node_modules/ws/lib/event-target.js:290:14)
at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:209:9)
at WebSocket.emit (node:events:513:28)
at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1184:20)
at Receiver.emit (node:events:513:28)
at Receiver.dataMessage (/app/node_modules/ws/lib/receiver.js:541:14)
at Receiver.getData (/app/node_modules/ws/lib/receiver.js:459:17)
at Receiver.startLoop (/app/node_modules/ws/lib/receiver.js:158:22)
at Receiver._write (/app/node_modules/ws/lib/receiver.js:84:10)
at writeOrBuffer (node:internal/streams/writable:392:12)
at _write (node:internal/streams/writable:333:10)
at Writable.write (node:internal/streams/writable:337:10)

最大的问题是它无法捕获,因此导致我的整个应用程序崩溃。

我在每个Puppeteer操作之前都使用await,以下是涉及Puppeteer的全部代码:

let browser = null;
try {
    browser = await puppeteer.launch({
        headless: "new",
        args: [
            `--user-agent=${userAgent}`,
            '--no-sandbox'
        ]
    });
    browser.on('disconnected', () => browserDisconnected = true)
    const page = await browser.newPage();

    await useProxy(page, proxyUrl);
    await page.goto(url);
    await setTimeout(pageLoadTimeout);
    await page.waitForSelector("#Title");

    const renderedContent = await page.content();
    const $ = cheerio.load(renderedContent);

    const title = $("#Title").html();

    output = { title: title };
} catch (error) {
    console.log(error);
    output =  { error: true, error: error.message };
} finally {
    try {
        if(browser !== null) {
            await browser.close();
        }
    } catch(error) {
        console.log("ERROR CLOSING BROWSER:");
        console.log(error);
    }
    return output;
}   

我已经从代码中删除了不涉及Puppeteer的任务。

我对如何处理此错误毫无头绪,我几乎看过关于此主题的所有SO、GitHub等问题。
如果我能捕获这个错误,使其不会导致整个服务器崩溃并不得不重新启动,那将非常有帮助。

我在哪里运行服务器?
我在Google App Engine中的Docker容器中运行,平台为--platform=linux/amd64 node:18

非常感谢任何帮助!

英文:

I am having problems with using puppeteer.
Every 10 to 50 requests I get the following error:

TargetCloseError: Protocol error (Network.getCookies): Target closed
at CallbackRegistry.clear (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:153:36)
at CDPSessionImpl._onClosed (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:468:70)
at Connection.onMessage (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:265:25)
at WebSocket.&lt;anonymous&gt; (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/NodeWebSocketTransport.js:62:32)
at callListener (/app/node_modules/ws/lib/event-target.js:290:14)
at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:209:9)
at WebSocket.emit (node:events:513:28)
at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1184:20)
at Receiver.emit (node:events:513:28)
at Receiver.dataMessage (/app/node_modules/ws/lib/receiver.js:541:14)
at Receiver.getData (/app/node_modules/ws/lib/receiver.js:459:17)
at Receiver.startLoop (/app/node_modules/ws/lib/receiver.js:158:22)
at Receiver._write (/app/node_modules/ws/lib/receiver.js:84:10)
at writeOrBuffer (node:internal/streams/writable:392:12)
at _write (node:internal/streams/writable:333:10)
at Writable.write (node:internal/streams/writable:337:10) 

And the biggest problem with it is the fact that it is not catchable and therefore causes my whole app to crash.

I use await before every puppeteer-action and here is all of my code that involves puppeteer:

let browser = null;
try {
    browser = await puppeteer.launch({
        headless: &quot;new&quot;,
        args: [
            `--user-agent=${userAgent}`,
            &#39;--no-sandbox&#39;
        ]
    });
    browser.on(&#39;disconnected&#39;, () =&gt; browserDisconnected = true)
    const page = await browser.newPage();

    await useProxy(page, proxyUrl);
    await page.goto(url);
    await setTimeout(pageLoadTimeout);
    await page.waitForSelector(&quot;#Title&quot;);

    const renderedContent = await page.content();
    const $ = cheerio.load(renderedContent);

    const title = $(&quot;#Title&quot;).html();

    output = { title: title };
} catch (error) {
    console.log(error);
    output =  { error: true, error: error.message };
} finally {
    try {
        if(browser !== null) {
            await browser.close();
        }
    } catch(error) {
        console.log(&quot;ERROR CLOSING BROWSER:&quot;);
        console.log(error);
    }
    return output;
}   

I've stripped the code from any tasks that don't involve puppeteer.

I am absolutely clueless on how to handle this error and I've pretty much seen any SO, Github, ... issue on this topic.
It would already be helpful if I could catch the error so it doesn't cause my whole server to crash and have to restart.

Where am I running the server?
In a docker container with the following platform --platform=linux/amd64 node:18 in Google App Engine.

Any help would be very appreciated!

答案1

得分: 2

感谢 @Yaroslavm 提供正确方向的提示,我现在找到了这个特定问题的原因和解决方案。

问题出在 puppeteer-page-proxy 包中的 await useProxy(page, proxyUrl);。错误在包内部发生。

幸运的是,还有另一种在 Puppeteer 中使用代理的方法。

我找到了以下解决方案(https://pixeljets.com/blog/how-to-set-proxy-in-puppeteer/):

const oldProxyUrl = `http://${proxy.username}:${proxy.password}@${proxy.address}:${proxy.port}`;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

browser = await puppeteer.launch({
   headless: "true",
   args: [
     `--user-agent=${userAgent}`,
     '--no-sandbox',
     `--proxy-server=${newProxyUrl}`
   ]
});

const page = await browser.newPage();

完成!

我提供的链接中还提到了更多选项,我还没有进行测试。希望我能对你有所帮助!

英文:

Thanks to a hint in the right direction by @Yaroslavm, I've now found the cause and solution for this particular problem.

The issue was with await useProxy(page, proxyUrl); from the puppeteer-page-proxy&#39;package. The error occurred internally within the package.

Luckily there is another way to use a proxy with Puppeteer.

I've found the following solution (https://pixeljets.com/blog/how-to-set-proxy-in-puppeteer/):

const oldProxyUrl = `http://${proxy.username}:${proxy.password}@${proxy.address}:${proxy.port}`;
 const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);

browser = await puppeteer.launch({
   headless: &quot;true&quot;,
   args: [
     `--user-agent=${userAgent}`,
     &#39;--no-sandbox&#39;,
     `--proxy-server=${newProxyUrl}`
   ]
});

const page = await browser.newPage();

Done!

The link I provided above mentions more options which I have yet to test.
I hope I could help!

答案2

得分: 0

Puppeteer通常在文件执行完成后表现相同,但仍然有一些操作没有正确等待(处于Promise挂起状态)。

这些函数是异步的吗?

const $ = cheerio.load(renderedContent);

const title = $("#Title").html();

如果是的话,请尝试等待它们被解决。

英文:

Puppeteer usually acts same way, when file execution was finished, but there are still some actions that are not properly awaited (in Promise pending status).

Are this functions async?

> const $ = cheerio.load(renderedContent);
>
> const title = $("#Title").html();

If yes - try to await for them to be resolved.

huangapple
  • 本文由 发表于 2023年7月5日 00:30:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76614478.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定