英文:
Selenium Screenshots are weird and bad when zoomed in
问题
我正在尝试在我的网站上截取一些 LaTeX 公式的屏幕截图。我想要使用 Selenium 进行自动化。在低缩放下,脚本生成了良好的屏幕截图,但 LaTeX 方程式的分辨率较低,这并不理想:
在高缩放下,如果我检查元素并保存截图,图片效果很好,但是 Selenium 无法截取相同的截图,而是生成了糟糕的截图,如下所示:
独立于缩放,有时截图只会变成大白色矩形。我想知道整个问题的解决方案是什么。
这是我的脚本:
您需要安装以下库:
pip install selenium pyautogui webdriver-manager
如果运行它,它将在运行目录中创建一个临时文件夹,并将图像放入其中。
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import pyautogui
dry_run = False # 如果为False,不保存任何内容
save_to_temp_folder = True # 如果为True,保存到临时文件夹而不是content/formulas(用于预览)
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get('http://stemformulas.com/formulas')
driver.implicitly_wait(10)
def zoom_in(times=1):
for i in range(times):
pyautogui.hotkey('command', '+')
# 找到指定<ul>元素内的列表项
ul_element = driver.find_element(By.CSS_SELECTOR, 'ul.flex.flex-row.mt-8')
list_items = ul_element.find_elements(By.TAG_NAME, 'li')
# 计算列表项的数量。减去一个是为了下一页按钮。
page_count = len(list_items) - 1
print(f"Found {page_count} pages of formulas")
# 增加缩放到400%以获得更高质量的屏幕截图
# zoom_in(4)
# driver.execute_script("document.body.style.zoom = '200%'")
for i in range(1, page_count + 1): # 遍历页面
print("Visiting page: ", i)
# 访问公式页面
driver.get(f'http://stemformulas.com/formulas/page/{i}')
# 找到具有class "grid-container mt-6"的部分
section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
# 在部分内找到所有锚元素
anchors = section.find_elements(By.TAG_NAME, 'a')
num_anchors = len(anchors)
print(f"Found ", num_anchors, " formula grid items")
# 遍历公式网格项
for i in range(num_anchors):
# 如果我遍历锚点它们会过期,所以每次重新获取部分、锚点
section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
anchors = section.find_elements(By.TAG_NAME, 'a')
anchor = anchors[i]
href = anchor.get_attribute('href')
x_path = f"//*[@id=\"main-content\"]/section[2]/a[{i+1}]/div[1]"
div = anchor.find_element(By.XPATH, x_path)
div.location_once_scrolled_into_view # 滚动到视图中
# http://stemformulas.com/formulas/<folder_name>/
folder_name = href.split('/')[-2]
output_image_path = os.path.join("content", "formulas", folder_name, "preview.png")
if dry_run:
print(f"Would have screenshot div {div} to {output_image_path}")
elif save_to_temp_folder:
if not os.path.exists("temp"):
os.mkdir("temp")
if not os.path.exists(os.path.join("temp", folder_name)):
os.mkdir(os.path.join("temp", folder_name))
new_path = os.path.join("temp", folder_name, "preview.png")
bits = div.screenshot_as_png
with open(new_path, 'wb') as f:
f.write(bits)
print(f"Screenshot div {div} to {new_path}")
time.sleep(1)
else:
bits = div.screenshot_as_png
with open(output_image_path, 'wb') as f:
f.write(bits)
print(f"Screenshot div {div} to {output_image_path}")
time.sleep(1) # 如果我们不休眠,元素会过期
我尝试过的一些方法:
- 使用
driver.execute_script("document.body.style.zoom = '400%'")
- 导致图片质量更差。 - headless - 未解决问题。
英文:
I'm trying to take screenshots of some latex formulas on my website. I want to automate it using Selenium. At low zoom, the script produces good screenshots, but the latex equations are low resolution, which is not ideal:
At high zoom, if I inspect element and save screenshots, the pictures are great, but Selenium fails to take the same screenshots, instead taking bad screenshots like so:
Independently, regardless of zoom, sometimes screenshots will just be large white rectangles. I am wondering what the solution to this whole mess is.
Here is my script:
You'll need to
`pip install selenium pyautogui webdriver-manager
If you run it, it will create a temp folder in the run directory and put the images in there.
import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
# from selenium.webdriver.chrome.options import Options
import pyautogui
dry_run = False # if False, don't save anything
save_to_temp_folder = True # if True, save to temp folder instead of content/formulas (for previewing)
# chrome_options = Options()
# chrome_options.add_argument("--headless")
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get('http://stemformulas.com/formulas')
driver.implicitly_wait(10)
def zoom_in(times=1):
for i in range(times):
pyautogui.hotkey('command', '+')
# Find the list items within the specified <ul> element
ul_element = driver.find_element(By.CSS_SELECTOR, 'ul.flex.flex-row.mt-8')
list_items = ul_element.find_elements(By.TAG_NAME, 'li')
# Count the number of list items. Subtract one for the next button.
page_count = len(list_items) - 1
print(f"Found {page_count} pages of formulas")
# increase zoom to 400% for higher quality screenshots
# zoom_in(4)
# driver.execute_script("document.body.style.zoom = '200%'")
for i in range(1, page_count + 1): # iterate over pages
print("Visiting page: ", i)
# visit formulas page
driver.get(f'http://stemformulas.com/formulas/page/{i}')
# Find the section with class "grid-container mt-6"
section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
# Find all anchor elements within the section
anchors = section.find_elements(By.TAG_NAME, 'a')
num_anchors = len(anchors)
print(f"Found ", num_anchors, " formula grid items")
# Iterate over the formula grid items
for i in range(num_anchors):
# if I iterate over the anchors they become stale so refetch section, anchors every time
section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
anchors = section.find_elements(By.TAG_NAME, 'a')
anchor = anchors[i]
href = anchor.get_attribute('href')
x_path = f"//*[@id=\"main-content\"]/section[2]/a[{i+1}]/div[1]"
div = anchor.find_element(By.XPATH, x_path)
div.location_once_scrolled_into_view # scroll into view
# http://stemformulas.com/formulas/<folder_name>/
folder_name = href.split('/')[-2]
output_image_path = os.path.join("content", "formulas", folder_name, "preview.png")
if dry_run:
print(f"Would have screenshot div {div} to {output_image_path}")
elif save_to_temp_folder:
if not os.path.exists("temp"):
os.mkdir("temp")
if not os.path.exists(os.path.join("temp", folder_name)):
os.mkdir(os.path.join("temp", folder_name))
new_path = os.path.join("temp", folder_name, "preview.png")
bits = div.screenshot_as_png
with open(new_path, 'wb') as f:
f.write(bits)
print(f"Screenshot div {div} to {new_path}")
time.sleep(1)
else:
bits = div.screenshot_as_png
with open(output_image_path, 'wb') as f:
f.write(bits)
print(f"Screenshot div {div} to {output_image_path}")
time.sleep(1) # if we don't sleep, elements become stale
Some stuff I've tried:
-
Using
driver.execute_script("document.body.style.zoom = '400%'")
- lead to even worse pictures -
headless - did not resolve issue
答案1
得分: 1
也许脚本还没有给浏览器足够的时间来完成内容的渲染。尝试等待它完成。也许当你搜索这个部分时:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
wait = WebDriverWait(driver, 10)
section = wait.until(EC.visibility_of_element_located(By.CSS_SELECTOR, 'section.grid-container.mt-6'))
英文:
Perhaps the script hasn't given the browser time to actually finish rendering the content. try waiting for it to finish. Maybe when you search for the section:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
wait = WebDriverWait(driver, 10)
section = wait.until(EC.visibility_of_element_located(By.CSS_SELECTOR, 'section.grid-container.mt-6'))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论