PDF生成从图像列表中太慢 – Python

huangapple go评论64阅读模式
英文:

PDF Generation out of an images list takes too long - Python

问题

我试图使用一个包含3张图片的列表来生成PDF,但它成为了我的程序瓶颈,每个PDF需要长达30秒的时间。我需要处理大量的图片,所以这个时间完全不能接受。到目前为止,我尝试过的解决方案都没有太大帮助。我正在测试的这三张图片分别为60 KB、125 KB和134 KB。

我尝试过使用PIL,每个PDF需要大约27秒的时间。我使用了以下代码:

def pil_pdf():  # 27秒
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
        imagelist.append(current_image)

    out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

还有使用FPDF的方法:

def new_pdf():  # 25秒
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f"{i}.png"))

    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)

    pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))

我希望将每个PDF的生成时间缩短到约10秒,但到目前为止我没有得到任何有用的建议。任何建议将不胜感激。

非常感谢任何建议或推荐!

英文:

I'm trying to generate a PDF using a list of 3 images, but it's being a bottleneck in my program - taking up to 30 seconds per PDF. I need to process a very big amount of images, so this time just wouldn't work. None of the solutions that I have tried so far have helped too much. The three images I'm testing with are 60 KB, 125 KB and 134 KB respectively.

I've tried using PIL, getting aroung 27 seconds per PDF. I used the following code:

def pil_pdf():  # 27 sec
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
        imagelist.append(current_image)

    out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
    imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])

... as well as with FPDF:

def new_pdf():  # 25 sec
    downloads = r"C:\Users\USER\Downloads"
    file_nmbr = 3
    imagelist = []
    for i in range(1, file_nmbr + 1):
        imagelist.append(os.path.join(downloads, f"{i}.png"))

    pdf = FPDF()
    for image in imagelist:
        pdf.add_page()
        pdf.image(image, 0, 0, 210, 297)

    pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))

I'd like to take the time down to about 10 seconds per PDF, but so far I haven't gotten any useful advice. Any advice would be extremely welcome.

Thanks so much for any suggestions or recommendations!

答案1

得分: 0

让我试一下:您应该在PyMuPDF中看到最佳性能:

import fitz  # 导入PyMuPDF

imglist = [...]  # 您的图像文件名列表
doc = fitz.open()  # 新建空白PDF

for ifile in imglist:
    idoc = fitz.open(ifile)
    pdfbytes = idoc.convert_to_pdf()
    doc.insert_pdf(fitz.open("pdf", pdfbytes))

doc.save("myimages.pdf", garbage=3, deflate=True)
英文:

Let me try a bet: the best performance you should see is with PyMuPDF:

import fitz  # import PyMuPDF

imglist = [...]  # your list of image filenames
doc = fitz.open()  # new empty PDF

for ifile in imglist:
    idoc = fitz.open(ifile)
    pdfbytes = idoc.convert_to_pdf()
    doc.insert_pdf(fitz.open("pdf", pdfbytes))

doc.save("myimages.pdf", garbage=3, deflate=True)

huangapple
  • 本文由 发表于 2023年2月8日 20:53:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75386100.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定