问题

我知道Puppeteer是一个简单而强大的工具，可以轻松获取网站数据。

据我所知，如果使用无头模式，与普通浏览器不同的属性会很多。

但是，如果我使用以下方法连接到一个已打开的浏览器，我无法检测到它吗？

首先：修改桌面上的Google浏览器快捷方式属性并打开浏览器
C:\Users\13632\AppData\Local\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222

const axios = require('axios')
const puppeteer = require('puppeteer')
async function main() {

    const response = await axios.get(`http://127.0.0.1:9222/json/version`);
    const webSocketDebuggerUrl = response.data.webSocketDebuggerUrl;

    browser = await puppeteer.connect({
        browserWSEndpoint: webSocketDebuggerUrl,
        ignoreDefaultArgs: ["--enable-automation"],
        slowMo: 100,
        defaultViewport: { width: 1280, height: 600 },
    });

    
    let target = await browser.waitForTarget(t => t.url().includes("your url"))
    const page = await target.page();

    

}
main()

以上方法是连接到一个已打开的浏览器，它是一个普通的Google浏览器。似乎无法检测它是否是一个自动化工具？还有其他方法可以判断对方是人还是机器吗？

英文:

I know that puppeteer is a simple and great tool, which can easily get the website data

As far as I know, if it is headless mode, there will be many properties different from normal browsers

But if I use the following method to link an open browser with the puppeteer , I can't detect it?

First :Modify Desktop Google Browser Shortcut Properties and open brwoser
C:\Users\13632\AppData\Local\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222

const axios = require(&#39;axios&#39;)
const puppeteer = require(&#39;puppeteer&#39;)
async function main() {

    const response = await axios.get(`http://127.0.0.1:9222/json/version`);
    const webSocketDebuggerUrl = response.data.webSocketDebuggerUrl;

    browser = await puppeteer.connect({
        browserWSEndpoint: webSocketDebuggerUrl,
        ignoreDefaultArgs: [&quot;--enable-automation&quot;],
        slowMo: 100,
        defaultViewport: { width: 1280, height: 600 },
    });

    
    let target = await browser.waitForTarget(t =&gt; t.url().includes(&quot;you url&quot;))
    const page = await target.page();

    

}
main()

The above method is to link to an opened browser, which is a normal Google browser. It seems that it is impossible to detect whether it is an automated tool? Is there any other way for me to judge whether the other party is a human or a machine

答案1

得分: 1

浏览器分析和自动化检测（以及应对它）是一个完整的子领域。一些驱动程序（如chromedriver；我没有使用过puppeteer）设置标志以指示自动化使用，但这些很容易被击败。（例如，可以参考undetected chromedriver这个包，它试图避免被检测到。）

然后是用户分析（机器人往往以可预测的方式点击），在浏览器中运行JS以尝试检测环境，列入黑名单的IP地址（大多数机器人都在代理后面），等等。

问问自己：你害怕什么？然后采取相应的防御措施。你放在互联网上的任何东西都可以被抓取，但你可以让它变得难以执行破坏性的操作，比如预订所有的音乐会门票，然后以500%的溢价转售。像这样的具体挑战有具体的答案；但并没有绝对可靠的方法来检测自动化浏览器，这样做只是在浪费精力。

英文:

Browser profiling and automation detection (and beating it) is an entire subfield of its own. Some drivers (chromedriver; I've not used puppeteer) set flags to indicate automated use, but these are easily defeated. (See for instance undetected chromedriver for a package which tries not to be detectable.)

Then there's user profiling (bots tend to click in predictable ways), running JS in the browser to try to detect the environment, blacklisting ips (most bots are behind proxies), and so on.

Ask yourself: what are you afraid of? And then defend against that. Anything you put on the Internet can and will be crawled, but you can make it hard to do disruptive things like booking all the concert tickets and the reselling them with a 500% markup. Specific challenges like that have specific answers; but there is no foolproof way to detect automated browsers, and doing so is a waste of effort.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何防止Puppeteer爬取我的网站内容

问题

答案1

如何查找任何级别的链接子元素

Stencil的Puppeteer有时不渲染文本或字体。

base64图像在尝试在headertemplate puppeteer中呈现图像时损坏

问题：querySelectorAll无法识别以数字开头的部分名称。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论