英文:
Error in Databricks Selenium "WebDriverException: Message: unknown error: no chrome binary at C:\Program Files\Google\Chrome\Application Stacktrace:"
问题
以下是已经翻译好的内容:
我正在尝试在Azure Databricks中使用Selenium在Chrome中进行网页抓取。请找到下面的代码。
%pip install selenium
%pip install webdriver_manager
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ExpectedConditions
from selenium.webdriver.chrome.options import Options
# 指定上传的chromedriver文件路径
chrome_driver_path = '/dbfs/FileStore/Chromedriver/chromedriver'
chrome_service = Service(chrome_driver_path)
# 配置Chrome选项
options = Options()
options.binary_location = "C:\Program Files\Google\Chrome\Application"
options.add_argument('--headless') # 在无界面模式下运行Chrome(无GUI)
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
# 创建一个新的Chrome webdriver实例
driver = webdriver.Chrome(service=chrome_service, options=options)
# 示例用法:打开网站并打印页面标题
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
driver.get(url)
# 清理并退出webdriver
driver.quit()
但是我遇到了以下错误 -
WebDriverException: Message: unknown error: no chrome binary at C:\Program Files\Google\Chrome\Application
Stacktrace:
英文:
I am trying to do webscraping using Selenium in Chrome within Azure Databricks. Please find the below code.
%pip install selenium
%pip install webdriver_manager
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ExpectedConditions
from selenium.webdriver.chrome.options import Options
# Specify the path to the uploaded chromedriver file
chrome_driver_path = '/dbfs/FileStore/Chromedriver/chromedriver'
chrome_service = Service(chrome_driver_path)
# Configure Chrome options
options = Options()
options.binary_location = "C:\Program Files\Google\Chrome\Application"
options.add_argument('--headless') # Run Chrome in headless mode (without GUI)
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
# Create a new Chrome webdriver instance
driver = webdriver.Chrome(service=chrome_service, options=options)
# Example usage: Open a website and print the page title
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
driver.get(url)
# Clean up and quit the webdriver
driver.quit()
However I am getting below error -
WebDriverException: Message: unknown error: no chrome binary at C:\Program Files\Google\Chrome\Application
Stacktrace:
答案1
得分: 1
请参考以下代码。
# 创建选项对象
options = Options()
options.add_argument('--headless') # 无头模式运行
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
# 使用shell命令下载Chromedriver版本113并保存到指定路径
%sh
wget -N https://chromedriver.storage.googleapis.com/113.0.5672.63/chromedriver_linux64.zip -O /tmp/chromedriver_linux64.zip
# 解压文件
%sh
unzip /tmp/chromedriver_linux64.zip -d /tmp/chromedriver113/
# 安装Chrome版本113
%sh
sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo echo "deb https://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -y update
sudo apt-get -y install google-chrome-stable
# 使用webdriver启动Chrome浏览器
browser = webdriver.Chrome(service=Service('/tmp/chromedriver113/chromedriver'), options=options)
# 打开指定网址
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
browser.get(url)
# 获取浏览器标题
browser.title
请参考此解决方案获取更多信息。
英文:
Try below code.
options = Options()
options.add_argument('--headless')
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
Using shell command save the chromedriver version 113 in
/tmp/chromedriver_linux64.zip
.
%sh
wget -N https://chromedriver.storage.googleapis.com/113.0.5672.63/chromedriver_linux64.zip -O /tmp/chromedriver_linux64.zip
Unzip the file.
%sh
unzip /tmp/chromedriver_linux64.zip -d /tmp/chromedriver113/
Install chrome version 113.
%sh
sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo echo "deb https://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -y update
sudo apt-get -y install google-chrome-stable
Know get the data from url.
browser = webdriver.Chrome(service=Service('/tmp/chromedriver113/chromedriver'), options=options)
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
browser.get(url)
browser.title
Follow this solution for more information.
答案2
得分: 0
尝试以下内容:
%pip install selenium
%pip install webdriver_manager
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ExpectedConditions
from selenium.webdriver import ChromeOptions
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
# 创建一个新的 Chrome webdriver 实例
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
driver.get(url)
driver.quit()
您的代码问题在于,您将 Chrome 驱动程序指向了一个 Windows 路径(C:\Program Files\Google\Chrome\Application),而这在 Databricks 工作空间中当然是不存在的。
英文:
Try the following:
%pip install selenium
%pip install webdriver_manager
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as ExpectedConditions
from selenium.webdriver import ChromeOptions
options = ChromeOptions()
options.add_argument('--headless')
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-gpu")
# Create a new Chrome webdriver instance
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options= options)
url = "https://data.cms.gov/tools/mapping-medicare-disparities-by-population"
driver.get(url)
driver.quit()
The issue with your code is that you are pointing the chrome driver to a Windows path (C:\Program Files\Google\Chrome\Application), which of course does not exist in the Databricks workspace.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论