2023年2月8日 20:53:32go评论108阅读模式

英文:

PDF Generation out of an images list takes too long - Python

问题

我试图使用一个包含3张图片的列表来生成PDF，但它成为了我的程序瓶颈，每个PDF需要长达30秒的时间。我需要处理大量的图片，所以这个时间完全不能接受。到目前为止，我尝试过的解决方案都没有太大帮助。我正在测试的这三张图片分别为60 KB、125 KB和134 KB。

我尝试过使用PIL，每个PDF需要大约27秒的时间。我使用了以下代码：

def pil_pdf():  # 27秒
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
        imagelist.append(current_image)
    out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

还有使用FPDF的方法：

def new_pdf():  # 25秒
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f"{i}.png"))
    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)
    pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))

我希望将每个PDF的生成时间缩短到约10秒，但到目前为止我没有得到任何有用的建议。任何建议将不胜感激。

非常感谢任何建议或推荐！

英文:

I'm trying to generate a PDF using a list of 3 images, but it's being a bottleneck in my program - taking up to 30 seconds per PDF. I need to process a very big amount of images, so this time just wouldn't work. None of the solutions that I have tried so far have helped too much. The three images I'm testing with are 60 KB, 125 KB and 134 KB respectively.

I've tried using PIL, getting aroung 27 seconds per PDF. I used the following code:

def pil_pdf():  # 27 sec
    downloads = r&quot;C:\Users\USER\Downloads&quot;
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f&quot;{i}.png&quot;)).convert(&quot;RGB&quot;)
        imagelist.append(current_image)
    out_folder = os.path.join(r&quot;C:\Users\USER\Downloads&quot;, f&quot;out_vPIL.pdf&quot;)
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

... as well as with FPDF:

def new_pdf():  # 25 sec
    downloads = r&quot;C:\Users\USER\Downloads&quot;
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f&quot;{i}.png&quot;))
    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)
    pdf.output(os.path.join(r&quot;C:\Users\USER\Downloads&quot;, f&quot;out.pdf&quot;))

I'd like to take the time down to about 10 seconds per PDF, but so far I haven't gotten any useful advice. Any advice would be extremely welcome.

Thanks so much for any suggestions or recommendations!

答案1

得分: 0

让我试一下：您应该在PyMuPDF中看到最佳性能：

import fitz  # 导入PyMuPDF
imglist = [...]  # 您的图像文件名列表
doc = fitz.open()  # 新建空白PDF
for ifile in imglist:
    idoc = fitz.open(ifile)
    pdfbytes = idoc.convert_to_pdf()
    doc.insert_pdf(fitz.open("pdf", pdfbytes))
doc.save("myimages.pdf", garbage=3, deflate=True)

英文:

Let me try a bet: the best performance you should see is with PyMuPDF:

import fitz  # import PyMuPDF
imglist = [...]  # your list of image filenames
doc = fitz.open()  # new empty PDF
for ifile in imglist:
    idoc = fitz.open(ifile)
    pdfbytes = idoc.convert_to_pdf()
    doc.insert_pdf(fitz.open(&quot;pdf&quot;, pdfbytes))
doc.save(&quot;myimages.pdf&quot;, garbage=3, deflate=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PDF生成从图像列表中太慢 – Python

问题

答案1

SOAP XML端点文档不完善 – 元素和属性概述

合并两个字典并替换缺失的数值。

如何删除日期？ (sqlite3)

Docker / 终端命令在Jupyter Notebook单元格中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。