英文:
Puppeteer always loads mobile script on remote server
问题
I'm trying to scrape (headless) this URL's scripts but I notice that whenever I'm doing it on my local machine I'm getting: "https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerDesktop.min.js"
script.
The issue is when I'm calling the API for scraping on a remote server (postman) I'm always getting a script that should appear only on mobile devices only:
https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerMobile.min.js"
this is my code:
public async fetchScripts(url: string, waitFor = 'cdn-pipes.js') {
const page = await this.browser.newPage();
try {
await page.goto(url, {timeout: 10000, waitUntil: 'domcontentloaded'});
const func = waitFor ? `document.documentElement.innerHTML.indexOf("${waitFor}") !== -1 || document.documentElement.innerHTML.indexOf("spa-detector") !== -1` :
'document.readyState === "complete"';
await page.waitForFunction(func, {polling: 500, timeout: 8000}).catch(reason => {
console.error('page.waitForFunction', {error: reason, url});
});
const pageUrls = await page.evaluate(() => {
const urlArray = Array.from(document.scripts).map((link) => link.src).filter(value => value.includes('taboola.com'));
return [...new Set(urlArray)];
});
console.log('fetchMinimal - urlsArray ', {pageUrls});
return pageUrls;
} catch (e) {
console.error('fetchMinimal - error ', e);
} finally {
await page.close();
}
}
I'm suspecting this is a CDN issue that saving old scripts somehow IDK, any thoughts?
UPDATE:
It's happening because the page loads the mobile script only if
window.matchMedia(" only screen and (min-device-width : 320px) and (max-device-width : 480px)").matches
which is always true on chromium-browser.
英文:
I'm trying to scrape (headless) this URL's scripts but I notice that whenever I'm doing it on my local machine I'm getting: "https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerDesktop.min.js"
script.
The issue is when I'm calling the API for scraping on a remote server (postman) I'm always getting a script that should appear only on mobile devices only:
https://vidstat.taboola.com/lite-unit/4.1.0/UnitFeedManagerMobile.min.js"
this is my code:
public async fetchScripts(url: string, waitFor = 'cdn-pipes.js') {
const page = await this.browser.newPage();
try {
await page.goto(url, {timeout: 10000, waitUntil: 'domcontentloaded'});
const func = waitFor ? `document.documentElement.innerHTML.indexOf("${waitFor}") !== -1 || document.documentElement.innerHTML.indexOf("spa-detector") !== -1` :
'document.readyState === "complete"';
await page.waitForFunction(func, {polling: 500, timeout: 8000}).catch(reason => {
console.error('page.waitForFunction', {error: reason, url});
});
const pageUrls = await page.evaluate(() => {
const urlArray = Array.from(document.scripts).map((link) => link.src).filter(value => value.includes('taboola.com'));
return [...new Set(urlArray)];
});
console.log('fetchMinimal - urlsArray ', {pageUrls});
return pageUrls;
} catch (e) {
console.error('fetchMinimal - error ', e);
} finally {
await page.close();
}
}
I'm suspecting this is a CDN issue that saving old scripts somehow IDK, any thoughts?
UPDATE:
It's happening because the page loads the mobile script only if
window.matchMedia(" only screen and (min-device-width : 320px) and (max-device-width : 480px)").matches
which is always true on chromium-browser.
答案1
得分: 0
感谢这个答案 - 我通过向 puppeteer.launch
提供 args: ['--window-size=1920,1080']
来解决了它。
英文:
Thanks to this answer - I managed to solve it by providing args: ['--window-size=1920,1080']
to puppeteer.launch
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论