英文:
Puppeteer Uncatchable Target Closed Error
问题
我在使用Puppeteer时遇到了问题。
每10到50个请求,我都会收到以下错误消息:
TargetCloseError: Protocol error (Network.getCookies): Target closed
at CallbackRegistry.clear (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:153:36)
at CDPSessionImpl._onClosed (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:468:70)
at Connection.onMessage (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:265:25)
at WebSocket.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/NodeWebSocketTransport.js:62:32)
at callListener (/app/node_modules/ws/lib/event-target.js:290:14)
at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:209:9)
at WebSocket.emit (node:events:513:28)
at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1184:20)
at Receiver.emit (node:events:513:28)
at Receiver.dataMessage (/app/node_modules/ws/lib/receiver.js:541:14)
at Receiver.getData (/app/node_modules/ws/lib/receiver.js:459:17)
at Receiver.startLoop (/app/node_modules/ws/lib/receiver.js:158:22)
at Receiver._write (/app/node_modules/ws/lib/receiver.js:84:10)
at writeOrBuffer (node:internal/streams/writable:392:12)
at _write (node:internal/streams/writable:333:10)
at Writable.write (node:internal/streams/writable:337:10)
最大的问题是它无法捕获,因此导致我的整个应用程序崩溃。
我在每个Puppeteer操作之前都使用await
,以下是涉及Puppeteer的全部代码:
let browser = null;
try {
browser = await puppeteer.launch({
headless: "new",
args: [
`--user-agent=${userAgent}`,
'--no-sandbox'
]
});
browser.on('disconnected', () => browserDisconnected = true)
const page = await browser.newPage();
await useProxy(page, proxyUrl);
await page.goto(url);
await setTimeout(pageLoadTimeout);
await page.waitForSelector("#Title");
const renderedContent = await page.content();
const $ = cheerio.load(renderedContent);
const title = $("#Title").html();
output = { title: title };
} catch (error) {
console.log(error);
output = { error: true, error: error.message };
} finally {
try {
if(browser !== null) {
await browser.close();
}
} catch(error) {
console.log("ERROR CLOSING BROWSER:");
console.log(error);
}
return output;
}
我已经从代码中删除了不涉及Puppeteer的任务。
我对如何处理此错误毫无头绪,我几乎看过关于此主题的所有SO、GitHub等问题。
如果我能捕获这个错误,使其不会导致整个服务器崩溃并不得不重新启动,那将非常有帮助。
我在哪里运行服务器?
我在Google App Engine中的Docker容器中运行,平台为--platform=linux/amd64 node:18
。
非常感谢任何帮助!
英文:
I am having problems with using puppeteer.
Every 10 to 50 requests I get the following error:
TargetCloseError: Protocol error (Network.getCookies): Target closed
at CallbackRegistry.clear (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:153:36)
at CDPSessionImpl._onClosed (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:468:70)
at Connection.onMessage (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/Connection.js:265:25)
at WebSocket.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/common/NodeWebSocketTransport.js:62:32)
at callListener (/app/node_modules/ws/lib/event-target.js:290:14)
at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:209:9)
at WebSocket.emit (node:events:513:28)
at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:1184:20)
at Receiver.emit (node:events:513:28)
at Receiver.dataMessage (/app/node_modules/ws/lib/receiver.js:541:14)
at Receiver.getData (/app/node_modules/ws/lib/receiver.js:459:17)
at Receiver.startLoop (/app/node_modules/ws/lib/receiver.js:158:22)
at Receiver._write (/app/node_modules/ws/lib/receiver.js:84:10)
at writeOrBuffer (node:internal/streams/writable:392:12)
at _write (node:internal/streams/writable:333:10)
at Writable.write (node:internal/streams/writable:337:10)
And the biggest problem with it is the fact that it is not catchable and therefore causes my whole app to crash.
I use await
before every puppeteer-action and here is all of my code that involves puppeteer:
let browser = null;
try {
browser = await puppeteer.launch({
headless: "new",
args: [
`--user-agent=${userAgent}`,
'--no-sandbox'
]
});
browser.on('disconnected', () => browserDisconnected = true)
const page = await browser.newPage();
await useProxy(page, proxyUrl);
await page.goto(url);
await setTimeout(pageLoadTimeout);
await page.waitForSelector("#Title");
const renderedContent = await page.content();
const $ = cheerio.load(renderedContent);
const title = $("#Title").html();
output = { title: title };
} catch (error) {
console.log(error);
output = { error: true, error: error.message };
} finally {
try {
if(browser !== null) {
await browser.close();
}
} catch(error) {
console.log("ERROR CLOSING BROWSER:");
console.log(error);
}
return output;
}
I've stripped the code from any tasks that don't involve puppeteer.
I am absolutely clueless on how to handle this error and I've pretty much seen any SO, Github, ... issue on this topic.
It would already be helpful if I could catch the error so it doesn't cause my whole server to crash and have to restart.
Where am I running the server?
In a docker container with the following platform --platform=linux/amd64 node:18
in Google App Engine.
Any help would be very appreciated!
答案1
得分: 2
感谢 @Yaroslavm 提供正确方向的提示,我现在找到了这个特定问题的原因和解决方案。
问题出在 puppeteer-page-proxy
包中的 await useProxy(page, proxyUrl);
。错误在包内部发生。
幸运的是,还有另一种在 Puppeteer 中使用代理的方法。
我找到了以下解决方案(https://pixeljets.com/blog/how-to-set-proxy-in-puppeteer/):
const oldProxyUrl = `http://${proxy.username}:${proxy.password}@${proxy.address}:${proxy.port}`;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
browser = await puppeteer.launch({
headless: "true",
args: [
`--user-agent=${userAgent}`,
'--no-sandbox',
`--proxy-server=${newProxyUrl}`
]
});
const page = await browser.newPage();
完成!
我提供的链接中还提到了更多选项,我还没有进行测试。希望我能对你有所帮助!
英文:
Thanks to a hint in the right direction by @Yaroslavm, I've now found the cause and solution for this particular problem.
The issue was with await useProxy(page, proxyUrl);
from the puppeteer-page-proxy'
package. The error occurred internally within the package.
Luckily there is another way to use a proxy with Puppeteer.
I've found the following solution (https://pixeljets.com/blog/how-to-set-proxy-in-puppeteer/):
const oldProxyUrl = `http://${proxy.username}:${proxy.password}@${proxy.address}:${proxy.port}`;
const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
browser = await puppeteer.launch({
headless: "true",
args: [
`--user-agent=${userAgent}`,
'--no-sandbox',
`--proxy-server=${newProxyUrl}`
]
});
const page = await browser.newPage();
Done!
The link I provided above mentions more options which I have yet to test.
I hope I could help!
答案2
得分: 0
Puppeteer通常在文件执行完成后表现相同,但仍然有一些操作没有正确等待(处于Promise挂起状态)。
这些函数是异步的吗?
const $ = cheerio.load(renderedContent);
const title = $("#Title").html();
如果是的话,请尝试等待它们被解决。
英文:
Puppeteer usually acts same way, when file execution was finished, but there are still some actions that are not properly awaited (in Promise pending status).
Are this functions async?
> const $ = cheerio.load(renderedContent);
>
> const title = $("#Title").html();
If yes - try to await for them to be resolved.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论