英文:
Convert PDF to PNG using pdf2image
问题
# 试图使用pdf2image将PDF转换为PNG文件类型,而不使用路径。我想将InMemoryUploadedFile传递给转换函数,而不是指定PDF文件的路径。我不认为pdf2image可以做到这一点,但想知道是否有其他方法可以实现?
# 下面的代码是在我的主项目之外进行测试。在将文件传递给函数以充当InMemoryUploadedFile之前,先从路径中打开文件。
from pdf2image import convert_from_path, convert_from_bytes
file = open(r"PDF\pdf_files\NDB.pdf")
images = convert_from_path(file, poppler_path=r"C:\Python\poppler\poppler-23.07.0\Library\bin")
for pdf in images:
for i in range(len(pdf)):
# 将页面保存为pdf中的图像
images[i].save(f'PDF\image_mods\image_converted_{i+1}.png', 'PNG')
英文:
I'm trying to convert a PDF to a PNG file type using pdf2image without using a path. I want to pass a InMemoryUploadedFile into the function for conversion instead of specifying a path to a PDF file. I don't think pdf2image can do this but was wondering if there is another way to do this?
The below code is testing outside of my main project. Trying to figure out how to do it before integrating the function into my project. file = opening from path but is then passed in to act as a InMemoryUploadedFile
from pdf2image import convert_from_path, convert_from_bytes
file = open(r"PDF\pdf_files\NDB.pdf")
images = convert_from_path(file, poppler_path=r"C:\Python\poppler\poppler-23.07.0\Library\bin")
for pdf in images:
for i in range(len(pdf)):
# Save pages as images in the pdf
images[i].save(f'PDF\image_mods\image_converted_{i+1}.png', 'PNG')
答案1
得分: 0
你说得对,pdf2image 不直接支持将 InMemoryUploadedFile 对象转换成图像。但是,你可以使用 PyPDF2 库从 InMemoryUploadedFile 中读取 PDF 内容,然后使用 pdf2image 将其转换为图像。以下是如何实现的示例:
from io import BytesIO
import PyPDF2
from pdf2image import convert_from_bytes
# 假设 "file" 是包含 PDF 内容的 InMemoryUploadedFile 对象
# 从 InMemoryUploadedFile 中读取 PDF 内容
pdf_content = file.read()
# 创建一个 BytesIO 对象来处理 PDF 内容
pdf_stream = BytesIO(pdf_content)
# 使用 PyPDF2 获取 PDF 中的页面数(可选步骤)
pdf_reader = PyPDF2.PdfFileReader(pdf_stream)
num_pages = pdf_reader.numPages
# 使用 pdf2image 将 PDF 内容转换为图像
images = convert_from_bytes(pdf_content, poppler_path=r"C:\Python\poppler\poppler-23.07.0\Library\bin")
# 保存每个图像
for i, pdf in enumerate(images):
# 以 PNG 格式保存图像
pdf.save(f'PDF\image_mods\image_converted_{i + 1}.png', 'PNG')
在这里,我们使用 BytesIO 创建了一个包含 InMemoryUploadedFile 的 PDF 内容流。然后,我们使用 PyPDF2 读取 PDF 内容并获取 PDF 中的页面数(如果不需要页面数,此步骤是可选的)。最后,我们将 PDF 内容(以字节形式)传递给 pdf2image 的 convert_from_bytes 函数以将其转换为图像,然后将图像保存为 PNG 文件。
确保如果尚未安装 PyPDF2 和 pdf2image 库,进行安装:
pip install PyPDF2
pip install pdf2image
英文:
You are correct that pdf2image does not directly support converting an InMemoryUploadedFile object. However, you can use the PyPDF2 library to read the PDF content from the InMemoryUploadedFile and then convert it to images using pdf2image. Below is an example of how you can achieve this
from io import BytesIO
import PyPDF2
from pdf2image import convert_from_bytes
# Assuming "file" is an InMemoryUploadedFile object containing the PDF content
# Read the PDF content from the InMemoryUploadedFile
pdf_content = file.read()
# Create a BytesIO object to handle the PDF content
pdf_stream = BytesIO(pdf_content)
# Use PyPDF2 to get the number of pages in the PDF (optional step)
pdf_reader = PyPDF2.PdfFileReader(pdf_stream)
num_pages = pdf_reader.numPages
# Convert the PDF content to images using pdf2image
images = convert_from_bytes(pdf_content, poppler_path=r"C:\Python\poppler\poppler-23.07.0\Library\bin")
# Save each image
for i, pdf in enumerate(images):
# Save pages as images in the pdf
pdf.save(f'PDF\image_mods\image_converted_{i + 1}.png', 'PNG')
Here, we use BytesIO to create a stream of the PDF content from the InMemoryUploadedFile. Then, we use PyPDF2 to read the PDF content and get the number of pages in the PDF (this step is optional if you don't need the number of pages). Finally, we pass the PDF content (in bytes) to pdf2image's convert_from_bytes function to convert it to images, and then save the images as PNG files.
Make sure to install both PyPDF2 and pdf2image libraries if you haven't already
pip install PyPDF2
pip install pdf2image
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论