2023年6月26日 14:12:33go评论106阅读模式

英文:

Selenium Screenshots are weird and bad when zoomed in

问题

我正在尝试在我的网站上截取一些 LaTeX 公式的屏幕截图。我想要使用 Selenium 进行自动化。在低缩放下，脚本生成了良好的屏幕截图，但 LaTeX 方程式的分辨率较低，这并不理想：

在高缩放下，如果我检查元素并保存截图，图片效果很好，但是 Selenium 无法截取相同的截图，而是生成了糟糕的截图，如下所示：

独立于缩放，有时截图只会变成大白色矩形。我想知道整个问题的解决方案是什么。

这是我的脚本：

您需要安装以下库：

pip install selenium pyautogui webdriver-manager

如果运行它，它将在运行目录中创建一个临时文件夹，并将图像放入其中。

import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
import pyautogui
dry_run = False  # 如果为False，不保存任何内容
save_to_temp_folder = True  # 如果为True，保存到临时文件夹而不是content/formulas（用于预览）
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get('http://stemformulas.com/formulas')
driver.implicitly_wait(10)
def zoom_in(times=1):
    for i in range(times):
        pyautogui.hotkey('command', '+')
# 找到指定<ul>元素内的列表项
ul_element = driver.find_element(By.CSS_SELECTOR, 'ul.flex.flex-row.mt-8')
list_items = ul_element.find_elements(By.TAG_NAME, 'li')
# 计算列表项的数量。减去一个是为了下一页按钮。
page_count = len(list_items) - 1
print(f"Found {page_count} pages of formulas")
# 增加缩放到400%以获得更高质量的屏幕截图
# zoom_in(4)
# driver.execute_script("document.body.style.zoom = '200%'")
for i in range(1, page_count + 1):  # 遍历页面
    print("Visiting page: ", i)
    # 访问公式页面
    driver.get(f'http://stemformulas.com/formulas/page/{i}')
    # 找到具有class "grid-container mt-6"的部分
    section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
    # 在部分内找到所有锚元素
    anchors = section.find_elements(By.TAG_NAME, 'a')
    num_anchors = len(anchors)
    print(f"Found ", num_anchors, " formula grid items")
    # 遍历公式网格项
    for i in range(num_anchors):
        # 如果我遍历锚点它们会过期，所以每次重新获取部分、锚点
        section = driver.find_element(By.CSS_SELECTOR, 'section.grid-container.mt-6')
        anchors = section.find_elements(By.TAG_NAME, 'a')
        anchor = anchors[i]
        href = anchor.get_attribute('href')
        x_path = f"//*[@id=\"main-content\"]/section[2]/a[{i+1}]/div[1]"
        div = anchor.find_element(By.XPATH, x_path)
        div.location_once_scrolled_into_view # 滚动到视图中
        # http://stemformulas.com/formulas/<folder_name>/
        folder_name = href.split('/')[-2]
        output_image_path = os.path.join("content", "formulas", folder_name, "preview.png")
        
        if dry_run:
            print(f"Would have screenshot div {div} to {output_image_path}")
        elif save_to_temp_folder:
            if not os.path.exists("temp"):
                os.mkdir("temp")
            if not os.path.exists(os.path.join("temp", folder_name)):
                os.mkdir(os.path.join("temp", folder_name))
            new_path = os.path.join("temp", folder_name, "preview.png")
            bits = div.screenshot_as_png
            with open(new_path, 'wb') as f:
                f.write(bits)
            print(f"Screenshot div {div} to {new_path}")
            time.sleep(1)
        else:
            bits = div.screenshot_as_png
            with open(output_image_path, 'wb') as f:
                f.write(bits)
            print(f"Screenshot div {div} to {output_image_path}")
            time.sleep(1) # 如果我们不休眠，元素会过期

我尝试过的一些方法：

使用 driver.execute_script("document.body.style.zoom = '400%'") - 导致图片质量更差。
headless - 未解决问题。

英文:

I'm trying to take screenshots of some latex formulas on my website. I want to automate it using Selenium. At low zoom, the script produces good screenshots, but the latex equations are low resolution, which is not ideal:

At high zoom, if I inspect element and save screenshots, the pictures are great, but Selenium fails to take the same screenshots, instead taking bad screenshots like so:

Independently, regardless of zoom, sometimes screenshots will just be large white rectangles. I am wondering what the solution to this whole mess is.

Here is my script:

You'll need to
`pip install selenium pyautogui webdriver-manager

If you run it, it will create a temp folder in the run directory and put the images in there.

import os
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
# from selenium.webdriver.chrome.options import Options
import pyautogui
dry_run = False  # if False, don&#39;t save anything
save_to_temp_folder = True  # if True, save to temp folder instead of content/formulas (for previewing)
# chrome_options = Options()
# chrome_options.add_argument(&quot;--headless&quot;)
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get(&#39;http://stemformulas.com/formulas&#39;)
driver.implicitly_wait(10)
def zoom_in(times=1):
    for i in range(times):
        pyautogui.hotkey(&#39;command&#39;, &#39;+&#39;)
# Find the list items within the specified &lt;ul&gt; element
ul_element = driver.find_element(By.CSS_SELECTOR, &#39;ul.flex.flex-row.mt-8&#39;)
list_items = ul_element.find_elements(By.TAG_NAME, &#39;li&#39;)
# Count the number of list items. Subtract one for the next button.
page_count = len(list_items) - 1
print(f&quot;Found {page_count} pages of formulas&quot;)
# increase zoom to 400% for higher quality screenshots
# zoom_in(4)
# driver.execute_script(&quot;document.body.style.zoom = &#39;200%&#39;&quot;)
for i in range(1, page_count + 1): # iterate over pages
    print(&quot;Visiting page: &quot;, i)
    # visit formulas page
    driver.get(f&#39;http://stemformulas.com/formulas/page/{i}&#39;)
    # Find the section with class &quot;grid-container mt-6&quot;
    section = driver.find_element(By.CSS_SELECTOR, &#39;section.grid-container.mt-6&#39;)
    # Find all anchor elements within the section
    anchors = section.find_elements(By.TAG_NAME, &#39;a&#39;)
    num_anchors = len(anchors)
    print(f&quot;Found &quot;, num_anchors, &quot; formula grid items&quot;)
    # Iterate over the formula grid items
    for i in range(num_anchors):
        # if I iterate over the anchors they become stale so refetch section, anchors every time
        section = driver.find_element(By.CSS_SELECTOR, &#39;section.grid-container.mt-6&#39;)
        anchors = section.find_elements(By.TAG_NAME, &#39;a&#39;)
        anchor = anchors[i]
        href = anchor.get_attribute(&#39;href&#39;)
        x_path = f&quot;//*[@id=\&quot;main-content\&quot;]/section[2]/a[{i+1}]/div[1]&quot;
        div = anchor.find_element(By.XPATH, x_path)
        div.location_once_scrolled_into_view # scroll into view
        # http://stemformulas.com/formulas/&lt;folder_name&gt;/
        folder_name = href.split(&#39;/&#39;)[-2]
        output_image_path = os.path.join(&quot;content&quot;, &quot;formulas&quot;, folder_name, &quot;preview.png&quot;)
        
        if dry_run:
            print(f&quot;Would have screenshot div {div} to {output_image_path}&quot;)
        elif save_to_temp_folder:
            if not os.path.exists(&quot;temp&quot;):
                os.mkdir(&quot;temp&quot;)
            if not os.path.exists(os.path.join(&quot;temp&quot;, folder_name)):
                os.mkdir(os.path.join(&quot;temp&quot;, folder_name))
            new_path = os.path.join(&quot;temp&quot;, folder_name, &quot;preview.png&quot;)
            bits = div.screenshot_as_png
            with open(new_path, &#39;wb&#39;) as f:
                f.write(bits)
            print(f&quot;Screenshot div {div} to {new_path}&quot;)
            time.sleep(1)
        else:
            bits = div.screenshot_as_png
            with open(output_image_path, &#39;wb&#39;) as f:
                f.write(bits)
            print(f&quot;Screenshot div {div} to {output_image_path}&quot;)
            time.sleep(1) # if we don&#39;t sleep, elements become stale

Some stuff I've tried:

Using driver.execute_script("document.body.style.zoom = '400%'") - lead to even worse pictures
headless - did not resolve issue

答案1

得分: 1

也许脚本还没有给浏览器足够的时间来完成内容的渲染。尝试等待它完成。也许当你搜索这个部分时：

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
wait = WebDriverWait(driver, 10)
section = wait.until(EC.visibility_of_element_located(By.CSS_SELECTOR, 'section.grid-container.mt-6'))

英文:

Perhaps the script hasn't given the browser time to actually finish rendering the content. try waiting for it to finish. Maybe when you search for the section:

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
wait = WebDriverWait(driver, 10)
section = wait.until(EC.visibility_of_element_located(By.CSS_SELECTOR, &#39;section.grid-container.mt-6&#39;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Selenium截图在放大时看起来奇怪且不清晰”

问题

答案1

主程序和使用dlopen加载的库需要不同版本的libsqlite3.so。

奇怪的时间序列图，在 x 轴添加日期时。

尝试根据排名行的值重新排列数据框中的多个列

Setting constants in golang outside of function as how it's done in Python

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。