英文:
PDF Generation out of an images list takes too long - Python
问题
我试图使用一个包含3张图片的列表来生成PDF,但它成为了我的程序瓶颈,每个PDF需要长达30秒的时间。我需要处理大量的图片,所以这个时间完全不能接受。到目前为止,我尝试过的解决方案都没有太大帮助。我正在测试的这三张图片分别为60 KB、125 KB和134 KB。
我尝试过使用PIL,每个PDF需要大约27秒的时间。我使用了以下代码:
def pil_pdf(): # 27秒
downloads = r"C:\Users\USER\Downloads"
file_nmbr = 3
imagelist = []
for i in range(1, file_nmbr + 1):
current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
imagelist.append(current_image)
out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])
还有使用FPDF的方法:
def new_pdf(): # 25秒
downloads = r"C:\Users\USER\Downloads"
file_nmbr = 3
imagelist = []
for i in range(1, file_nmbr + 1):
imagelist.append(os.path.join(downloads, f"{i}.png"))
pdf = FPDF()
for image in imagelist:
pdf.add_page()
pdf.image(image, 0, 0, 210, 297)
pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))
我希望将每个PDF的生成时间缩短到约10秒,但到目前为止我没有得到任何有用的建议。任何建议将不胜感激。
非常感谢任何建议或推荐!
英文:
I'm trying to generate a PDF using a list of 3 images, but it's being a bottleneck in my program - taking up to 30 seconds per PDF. I need to process a very big amount of images, so this time just wouldn't work. None of the solutions that I have tried so far have helped too much. The three images I'm testing with are 60 KB, 125 KB and 134 KB respectively.
I've tried using PIL, getting aroung 27 seconds per PDF. I used the following code:
def pil_pdf(): # 27 sec
downloads = r"C:\Users\USER\Downloads"
file_nmbr = 3
imagelist = []
for i in range(1, file_nmbr + 1):
current_image = Image.open(os.path.join(downloads, f"{i}.png")).convert("RGB")
imagelist.append(current_image)
out_folder = os.path.join(r"C:\Users\USER\Downloads", f"out_vPIL.pdf")
imagelist[0].save(out_folder, save_all=True, append_images=imagelist[1:])
... as well as with FPDF:
def new_pdf(): # 25 sec
downloads = r"C:\Users\USER\Downloads"
file_nmbr = 3
imagelist = []
for i in range(1, file_nmbr + 1):
imagelist.append(os.path.join(downloads, f"{i}.png"))
pdf = FPDF()
for image in imagelist:
pdf.add_page()
pdf.image(image, 0, 0, 210, 297)
pdf.output(os.path.join(r"C:\Users\USER\Downloads", f"out.pdf"))
I'd like to take the time down to about 10 seconds per PDF, but so far I haven't gotten any useful advice. Any advice would be extremely welcome.
Thanks so much for any suggestions or recommendations!
答案1
得分: 0
让我试一下:您应该在PyMuPDF中看到最佳性能:
import fitz # 导入PyMuPDF
imglist = [...] # 您的图像文件名列表
doc = fitz.open() # 新建空白PDF
for ifile in imglist:
idoc = fitz.open(ifile)
pdfbytes = idoc.convert_to_pdf()
doc.insert_pdf(fitz.open("pdf", pdfbytes))
doc.save("myimages.pdf", garbage=3, deflate=True)
英文:
Let me try a bet: the best performance you should see is with PyMuPDF:
import fitz # import PyMuPDF
imglist = [...] # your list of image filenames
doc = fitz.open() # new empty PDF
for ifile in imglist:
idoc = fitz.open(ifile)
pdfbytes = idoc.convert_to_pdf()
doc.insert_pdf(fitz.open("pdf", pdfbytes))
doc.save("myimages.pdf", garbage=3, deflate=True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论