Puppeteer总是在远程服务器上加载移动脚本。

huangapple go评论61阅读模式
英文:

Puppeteer always loads mobile script on remote server

问题

I'm trying to scrape (headless) this URL's scripts but I notice that whenever I'm doing it on my local machine I'm getting: "https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerDesktop.min.js" script.

The issue is when I'm calling the API for scraping on a remote server (postman) I'm always getting a script that should appear only on mobile devices only:
https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerMobile.min.js"

this is my code:

public async fetchScripts(url: string, waitFor = 'cdn-pipes.js') {
    
    const page = await this.browser.newPage();

    try {
      await page.goto(url, {timeout: 10000, waitUntil: 'domcontentloaded'});
      const func = waitFor ? `document.documentElement.innerHTML.indexOf("${waitFor}") !== -1 || document.documentElement.innerHTML.indexOf("spa-detector") !== -1` :
        'document.readyState === "complete"';
      await page.waitForFunction(func, {polling: 500, timeout: 8000}).catch(reason => {
        console.error('page.waitForFunction', {error: reason, url});
      });

      const pageUrls = await page.evaluate(() => {
        const urlArray = Array.from(document.scripts).map((link) => link.src).filter(value => value.includes('taboola.com'));

        return [...new Set(urlArray)];
      });

      console.log('fetchMinimal - urlsArray ', {pageUrls});

      return pageUrls;
    } catch (e) {
      console.error('fetchMinimal - error ', e);
    } finally {
      await page.close();
    }

  }

I'm suspecting this is a CDN issue that saving old scripts somehow IDK, any thoughts?

UPDATE:

It's happening because the page loads the mobile script only if
window.matchMedia(" only screen and (min-device-width : 320px) and (max-device-width : 480px)").matches which is always true on chromium-browser.

英文:

I'm trying to scrape (headless) this URL's scripts but I notice that whenever I'm doing it on my local machine I'm getting: "https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerDesktop.min.js" script.

The issue is when I'm calling the API for scraping on a remote server (postman) I'm always getting a script that should appear only on mobile devices only:
https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerMobile.min.js"

this is my code:

public async fetchScripts(url: string, waitFor = 'cdn-pipes.js') {
    
    const page = await this.browser.newPage();

    try {
      await page.goto(url, {timeout: 10000, waitUntil: 'domcontentloaded'});
      const func = waitFor ? `document.documentElement.innerHTML.indexOf("${waitFor}") !== -1 || document.documentElement.innerHTML.indexOf("spa-detector") !== -1` :
        'document.readyState === "complete"';
      await page.waitForFunction(func, {polling: 500, timeout: 8000}).catch(reason => {
        console.error('page.waitForFunction', {error: reason, url});
      });

      const pageUrls = await page.evaluate(() => {
        const urlArray = Array.from(document.scripts).map((link) => link.src).filter(value => value.includes('taboola.com'));

        return [...new Set(urlArray)];
      });

      console.log('fetchMinimal - urlsArray ', {pageUrls});

      return pageUrls;
    } catch (e) {
      console.error('fetchMinimal - error ', e);
    } finally {
      await page.close();
    }

  }

I'm suspecting this is a CDN issue that saving old scripts somehow IDK, any thoughts?

UPDATE:

It's happening because the page loads the mobile script only if
window.matchMedia(" only screen and (min-device-width : 320px) and (max-device-width : 480px)").matches which is always true on chromium-browser.

答案1

得分: 0

感谢这个答案 - 我通过向 puppeteer.launch 提供 args: ['--window-size=1920,1080'] 来解决了它。

英文:

Thanks to this answer - I managed to solve it by providing args: ['--window-size=1920,1080'] to puppeteer.launch

huangapple
  • 本文由 发表于 2023年7月18日 06:53:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76708545.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定