2023年5月25日 18:17:36go评论257阅读模式

英文:

Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort' when using Chromedriver in headless mode

问题

I can help you with the translation of the code-related parts. Here's the translated code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

# 设置ChromeDriver可执行文件的路径
webdriver_service = Service('C:\webdrivers')

# 设置Chrome运行在无头模式下
chrome_options = Options()
chrome_options.add_argument('--headless')

# 创建一个新的ChromeDriver实例
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# 要爬取的网站的URL
url = "https://www.manta.com/search?search=Restaurants&amp;context=unknown&amp;search_source=nav&amp;city=Dallas&amp;state=Texas&amp;country=United%20States&amp;pt=32.7936%2C-96.7662&amp;device=desktop&amp;screenResolution=1280x720"

# 跳转到网站
driver.get(url)
time.sleep(5)

# 查找所有餐厅店铺卡片
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div[2]")

print(listings)  # 打印所有餐厅店铺卡片

# 关闭驱动程序并退出浏览器
driver.quit()

Please note that I've translated the code portion only, as per your request.

英文:

I am trying to scrape the restaurant names & their phone no.'s from https://www.manta.com/ website. I am using selenium to automate the whole task since the website is dynamic in nature and while running the python code I got the following
"Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort'." and got empty list as the output.

Can u guys help me point out where I went wrong and also suggest me any books or websites where I can learn how to scrape dynamic websites ?

This is my python code for your reference.

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import time

# Set the path to the ChromeDriver executable
webdriver_service = Service(&#39;C:\webdrivers&#39;)

# Set Chrome options for running in headless mode
chrome_options = Options()
chrome_options.add_argument(&#39;--headless&#39;)

# Create a new ChromeDriver instance
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)

# URL of the website you want to scrape
url =&quot;https://www.manta.com/search?search=Restaurants&amp;context=unknown&amp;search_source=nav&amp;city=Dallas&amp;state=Texas&amp;country=United%20States&amp;pt=32.7936%2C-96.7662&amp;device=desktop&amp;screenResolution=1280x720&quot;

# Navigate to the website
driver.get(url)
time.sleep(5)

# Find all the restaurant shop cards
listings = driver.find_elements(By.XPATH, &quot;/html/body/main/div[3]/div[1]/div[1]/div[2]&quot;)

print(listings) #print all the restaurant shop cards

# Close the driver and quit the browser
driver.quit()

This is the error I got in my vscode

DevTools listening on ws://127.0.0.1:49483/devtools/browser/88be036f-53d8-4f6a-b9ff-5b103ad5e6ff
[0525/152704.760:INFO:CONSOLE(0)] &quot;Error with Permissions-Policy header: Origin trial controlled feature not enabled: &#39;interest-cohort&#39;.&quot;, 
source:  (0)
[0525/152705.055:INFO:CONSOLE(0)] &quot;Error with Permissions-Policy header: Origin trial controlled feature not enabled: &#39;interest-cohort&#39;.&quot;, 
source:  (0)
[]

答案1

得分: 1

因为该网站使用Cloudflare进行机器人检测，并阻止您的Selenium驱动Chrome实例加载页面。

我已经更新了您的定位器以获取所有店铺卡，并打印它们的文本，并使用undetected_chromedriver，它不会触发像Cloudflare这样的反机器人服务，并且会自动下载驱动程序二进制文件并进行修补。

完整代码

import time
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By

# 创建一个新的ChromeDriver实例
options = uc.ChromeOptions()
options.add_argument("--headless=new")
driver = uc.Chrome(options=options)

# 您要抓取的网站的URL
url = "https://www.manta.com/search?search=Restaurants&amp;context=unknown&amp;search_source=nav&amp;city=Dallas&amp;state=Texas&amp;country=United%20States&amp;pt=32.7936%2C-96.7662&amp;device=desktop&amp;screenResolution=1280x720"

# 导航到网站
driver.get(url)
time.sleep(5)
# 查找所有餐厅店铺卡
listings = driver.find_elements(By.XPATH, "/html/body/main/div[3]/div[1]/div[1]/div")
# 打印所有餐厅店铺卡
for listing in listings:
    print(listing.text)

# 关闭驱动程序并退出浏览器
driver.quit()

打印输出

QualitY Restaurants Dallas
1130 S Bowen Rd
Dallas, TX
(336) 536-4955
Visit Website
CLAIMED
Categori...

英文:

Its because the website use cloudflare for Bot detection and blocks your selenium driver Chrome instance from loading the page

I have updated your locators to get all the shop card and print their text and used undetected_chromedriver whichdoes not trigger anti-bot services like Cloudfare and Automatically downloads the driver binary and patches it.

Full code

import time
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By

# Create a new ChromeDriver instance
options = uc.ChromeOptions()
options.add_argument(&quot;--headless=new&quot;)
driver = uc.Chrome(options=options)

# URL of the website you want to scrape
url = &quot;https://www.manta.com/search?search=Restaurants&amp;context=unknown&amp;search_source=nav&amp;city=Dallas&amp;state=Texas&amp;country=United%20States&amp;pt=32.7936%2C-96.7662&amp;device=desktop&amp;screenResolution=1280x720&quot;

# Navigate to the website
driver.get(url)
time.sleep(5)
# Find all the restaurant shop cards
listings = driver.find_elements(By.XPATH, &quot;/html/body/main/div[3]/div[1]/div[1]/div&quot;)
# print all the restaurant shop cards
for listing in listings:
    print(listing.text)

# Close the driver and quit the browser
driver.quit()

Prints

QualitY Restaurants Dallas
1130 S Bowen Rd
Dallas, TX
(336) 536-4955
Visit Website
CLAIMED
Categori...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Error with Permissions-Policy header: Origin trial controlled feature not enabled: 'interest-cohort' when using Chromedriver in headless mode

问题

答案1

使用Python和Pandas将周数转换为日期

循环遍历 n 个 CSV 文件并在 Python 中删除列

如何使用Python从JSON对象中删除属性

基于矩阵替换字符串表达式中的值并遍历列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论