Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort' when using Chromedriver in headless mode

huangapple go评论174阅读模式
英文:

Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort' when using Chromedriver in headless mode

问题

I can help you with the translation of the code-related parts. Here's the translated code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

# 设置ChromeDriver可执行文件的路径
webdriver_service = Service('C:\webdrivers')

# 设置Chrome运行在无头模式下
chrome_options = Options()
chrome_options.add_argument('--headless')

# 创建一个新的ChromeDriver实例
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# 要爬取的网站的URL
url = "https://www.manta.com/search?search=Restaurants&context=unknown&search_source=nav&city=Dallas&state=Texas&country=United%20States&pt=32.7936%2C-96.7662&device=desktop&screenResolution=1280x720"

# 跳转到网站
driver.get(url)
time.sleep(5)

# 查找所有餐厅店铺卡片
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div[2]")

print(listings)  # 打印所有餐厅店铺卡片

# 关闭驱动程序并退出浏览器
driver.quit()

Please note that I've translated the code portion only, as per your request.

英文:

I am trying to scrape the restaurant names & their phone no.'s from https://www.manta.com/ website. I am using selenium to automate the whole task since the website is dynamic in nature and while running the python code I got the following
"Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'." and got empty list as the output.

Can u guys help me point out where I went wrong and also suggest me any books or websites where I can learn how to scrape dynamic websites ?

This is my python code for your reference.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

# Set the path to the ChromeDriver executable
webdriver_service = Service('C:\webdrivers')

# Set Chrome options for running in headless mode
chrome_options = Options()
chrome_options.add_argument('--headless')

# Create a new ChromeDriver instance
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# URL of the website you want to scrape
url ="https://www.manta.com/search?search=Restaurants&context=unknown&search_source=nav&city=Dallas&state=Texas&country=United%20States&pt=32.7936%2C-96.7662&device=desktop&screenResolution=1280x720"

# Navigate to the website
driver.get(url)
time.sleep(5)

# Find all the restaurant shop cards
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div[2]")

print(listings) #print all the restaurant shop cards

# Close the driver and quit the browser
driver.quit()

This is the error I got in my vscode

DevTools listening on ws://127.0.0.1:49483/devtools/browser/88be036f-53d8-4f6a-b9ff-5b103ad5e6ff
[0525/152704.760:INFO:CONSOLE(0)] "Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'.", 
source:  (0)
[0525/152705.055:INFO:CONSOLE(0)] "Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'.", 
source:  (0)
[]

答案1

得分: 1

因为该网站使用Cloudflare进行机器人检测,并阻止您的Selenium驱动Chrome实例加载页面。

我已经更新了您的定位器以获取所有店铺卡,并打印它们的文本,并使用undetected_chromedriver,它不会触发像Cloudflare这样的反机器人服务,并且会自动下载驱动程序二进制文件并进行修补。

完整代码

import time
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By

# 创建一个新的ChromeDriver实例
options = uc.ChromeOptions()
options.add_argument("--headless=new")
driver = uc.Chrome(options=options)

# 您要抓取的网站的URL
url = "https://www.manta.com/search?search=Restaurants&context=unknown&search_source=nav&city=Dallas&state=Texas&country=United%20States&pt=32.7936%2C-96.7662&device=desktop&screenResolution=1280x720"

# 导航到网站
driver.get(url)
time.sleep(5)
# 查找所有餐厅店铺卡
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div")
# 打印所有餐厅店铺卡
for listing in listings:
    print(listing.text)

# 关闭驱动程序并退出浏览器
driver.quit()

打印输出

QualitY Restaurants Dallas
1130 S Bowen Rd
Dallas, TX
(336) 536-4955
Visit Website
CLAIMED
Categori...
英文:

Its because the website use cloudflare for Bot detection and blocks your selenium driver Chrome instance from loading the page

I have updated your locators to get all the shop card and print their text and used undetected_chromedriver whichdoes not trigger anti-bot services like Cloudfare and Automatically downloads the driver binary and patches it.

Full code

import time
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By

# Create a new ChromeDriver instance
options = uc.ChromeOptions()
options.add_argument("--headless=new")
driver = uc.Chrome(options=options)

# URL of the website you want to scrape
url = "https://www.manta.com/search?search=Restaurants&context=unknown&search_source=nav&city=Dallas&state=Texas&country=United%20States&pt=32.7936%2C-96.7662&device=desktop&screenResolution=1280x720"

# Navigate to the website
driver.get(url)
time.sleep(5)
# Find all the restaurant shop cards
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div")
# print all the restaurant shop cards
for listing in listings:
    print(listing.text)

# Close the driver and quit the browser
driver.quit()

Prints

QualitY Restaurants Dallas
1130 S Bowen Rd
Dallas, TX
(336) 536-4955
Visit Website
CLAIMED
Categori...

huangapple
  • 本文由 发表于 2023年5月25日 18:17:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定