Python Selenium 浏览器 – 或其他远程控制库 – 无需手动下载要求

huangapple go评论86阅读模式
英文:

browser for Python Selenium - or other remote control library - without manual download requirements

问题

以下是已翻译的内容:

一个非常小的代码部分需要自动控制浏览器。我尝试使用Requests和BeautifulSoup,它们在其他项目中都能成功使用,但是我似乎漏掉了一些东西,从服务器那里得到了垃圾数据。

我愿意使用Selenium来满足我的需求(或者其他解决方案)。我已经设法让它工作,但我发现它需要手动下载一个驱动程序(我认为是这样)。对于使用我的库的其他人,我希望有一些可以自动安装或者不需要安装的默认选项。

我的问题是,是否有一种远程控制Windows浏览器的选项,只需通过一两个pip命令就能正常工作?我更倾向于使用Selenium解决方案,但似乎现在Selenium中的所有浏览器都需要额外手动下载。这是正确的吗,还是我漏掉了什么?

编辑: 我已将自动化驱动程序安装作为Selenium存储库的问题进行了添加:
https://github.com/SeleniumHQ/selenium/issues/7922

英文:

A very small part of my code base needs to automatically control a browser. I've tried using Requests and BeautifulSoup - which I've used successfully on other projects - but I'm missing something and getting junk back from the server.

I'm open to using Selenium for what I need (or some other solution). I've managed to get it working but I've found it requires manual download of a driver (I think). For other people that use my library I'd like something that can be installed automatically or some default option that doesn't require installation.

My question is thus, is there an option for remote controlling a Windows browser that just works with a pip call or two? I'd prefer a Selenium solution but it appears that all browsers in Selenium now require an additional manual download. Is that correct or am I missing something?

EDIT: I added automating driver installation as an issue to the Selenium repo:
https://github.com/SeleniumHQ/selenium/issues/7922

答案1

得分: 1

PYTHON

Edit1:

你可以使用webdriver_manager来处理这种情况,它会处理os.chomp。以下是安装webdriver_manager的pip命令:

pip install webdriver-manager

下面是Chrome的示例脚本。

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.google.com")
driver.quit

Old Answer:

============================================================

这将确保chromedriver始终是最新的稳定版本,而且您不必执行任何手动步骤

webdriver_manager的另一个优点是您可以动态下载任何驱动程序。以下是Firefox(GeckoDriver)的简单示例。

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())
driver.get("https://www.google.com")
driver.quit()

以下是动态下载最新chromedriver的不太优雅的代码,适用于Windows。

import requests
import wget
import zipfile
import os

# 获取最新的chrome driver版本号
url = 'https://chromedriver.storage.googleapis.com/LATEST_RELEASE'
response = requests.get(url)
version_number = response.text

# 构建下载链接
download_url = "https://chromedriver.storage.googleapis.com/" + version_number + "/chromedriver_win32.zip"

# 使用上面构建的链接下载zip文件
latest_driver_zip = wget.download(download_url, 'chromedriver.zip')

# 解压缩zip文件
with zipfile.ZipFile(latest_driver_zip, 'r') as zip_ref:
    zip_ref.extractall()  # 您可以在此处指定目标文件夹路径
# 删除上面下载的zip文件
os.remove(latest_driver_zip)

RUBY 这是在Windows上实现的Ruby解决方案。

require 'net/http'
require 'open-uri'
require 'zip'

# 用于将zip文件内容提取到目标路径的方法
def extract_zip(file, destination)
  # 如果目标文件夹不存在,则创建它
  FileUtils.mkdir_p(destination)
  Zip::File.open(file) do |zip_file|
    zip_file.each do |f|
      fpath = File.join(destination, f.name)
      zip_file.extract(f, fpath) unless File.exist?(fpath)
    end
  end
end

# 获取最新的chrome driver信息的URL
url = 'https://chromedriver.storage.googleapis.com/LATEST_RELEASE'

# 获取最新的驱动程序版本
parsed_url = URI.parse(url)
http = Net::HTTP.new(parsed_url.host, parsed_url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
request = Net::HTTP::Get.new(url)
request["Accept"] = 'application/json'
response = http.request(request)

# 基于上面的步骤构建下载链接
download_url = "https://chromedriver.storage.googleapis.com/#{response.body}/chromedriver_win32.zip"
puts download_url

# 构建要存储下载zip文件的临时位置
download_path = File.join(ENV['TEMP'], response.body.gsub('.', "_")) + '.zip'
puts download_path

# 下载zip文件
File.open(download_path, "wb") do |file|
  file.write open(download_url).read
end

# 从zip中提取chromedriver.exe到指定位置
extract_zip(download_path, "my_destination_folder_path") # 不要指定文件名。

# 删除zip文件
FileUtils.rm_rf(download_path)

这个.rb文件将在我收到驱动程序不支持Chrome版本异常并在下载chromedriver后恢复执行时执行。这样,我的脚本永远不会因Chrome版本更改而失败。

英文:

PYTHON

Edit1:

You can use the webdriver_manager to handle this scenario, where it will take care of os.chomp too. Here is the pip installation for webdriver_manager

pip install webdriver-manager

And below is the sample script for chrome.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://www.google.com")
driver.quit

Old Answer:

============================================================
This will make sure the chromedriver is always latest stable version and you don't have to do any manual steps

The other advantage with webdriver_manager is you can download any driver on fly. Below is the simple example for Firefox (GeckoDriver).

from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager

driver = webdriver.Firefox(executable_path =GeckoDriverManager().install())
driver.get("https://www.google.com")
driver.quit()

Here is the dirty code to download the latest chromedriver dynamically for windows.

import requests
import wget
import zipfile
import os

# get the latest chrome driver version number
url = 'https://chromedriver.storage.googleapis.com/LATEST_RELEASE'
response = requests.get(url)
version_number = response.text

# build the donwload url
download_url = "https://chromedriver.storage.googleapis.com/" + version_number +"/chromedriver_win32.zip"

# download the zip file using the url built above
latest_driver_zip = wget.download(download_url,'chromedriver.zip')

# extract the zip file
with zipfile.ZipFile(latest_driver_zip, 'r') as zip_ref:
    zip_ref.extractall() # you can specify the destination folder path here
# delete the zip file downloaded above
os.remove(latest_driver_zip)

RUBY Here is the solution implemented in ruby for windows.

require 'net/http'
require 'open-uri'
require 'zip'

# Method to extract the contents of the zip file to the destination path
def extract_zip(file, destination)
  # create the destination folder if it's not exist
  FileUtils.mkdir_p(destination)
  Zip::File.open(file) do |zip_file|
    zip_file.each do |f|
      fpath = File.join(destination, f.name)
      zip_file.extract(f, fpath) unless File.exist?(fpath)
    end
  end
end

# url where you can get the latest chrome driver information
url = 'https://chromedriver.storage.googleapis.com/LATEST_RELEASE'

# get the latest driver version
parsed_url = URI.parse(url)
http = Net::HTTP.new(parsed_url.host, parsed_url.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_PEER
request = Net::HTTP::Get.new(url)
request["Accept"] = 'application/json'
response = http.request(request)

# build the download url based on the above step
download_url = "https://chromedriver.storage.googleapis.com/#{response.body}/chromedriver_win32.zip"
puts download_url

# build the temporary location where you want to store the donwload zip
download_path = File.join(ENV['TEMP'],response.body.gsub('.',"_")) + '.zip'
puts download_path

# download the zip file
File.open(download_path, "wb") do |file|
  file.write open(download_url).read
end

# Extract the chromedriver.exe from the zip in specific location
extract_zip(download_path,"my_destination_folder_path") # don't specify the filename.

# delete the zip file
FileUtils.rm_rf(download_path)

This .rb file will be executed as and when I get the error that the driver does not support the chrome version exception and recover the execution after downloading the chromedriver. So that way, script my scripts never fail due to the chrome version changes.

huangapple
  • 本文由 发表于 2020年1月6日 02:32:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/59602984.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定