英文:
How to make a PDF from an online ebook that is displayed page by page?
问题
我想将像这样的书保存为PDF文件 https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html,它按页显示书的内容。
如何做到这一点?
到目前为止,我唯一成功的事情是逐页打印成PDF,然后将单独的PDF页合并。
有没有办法在Python或其他脚本中自动完成这个操作?
英文:
I would like to save into PDF books like this one to PDF https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html that shows a book page by page.
How to do it?
The only thing that I managed so far is to print page by page into a pdf, and then combine separate pdf pages.
Is there a way to do it automatically in Python or other scripts?
答案1
得分: 1
你可以使用requests
直接下载文档图片,并使用PIL
保存为PDF。例如:
import requests
from PIL import Image # pip install Pillow
from io import BytesIO
pdf_path = "doc.pdf"
url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'
images = [
Image.open(BytesIO(requests.get(url.format(f'{p:04}'), verify=False).content))
for p in range(1, 4) # <— 在这里增加页面数(现在将保存前3页)
]
# 借鉴自此答案:https://stackoverflow.com/a/47283224/10035985
images[0].save(
pdf_path, "PDF", resolution=100.0, save_all=True, append_images=images[1:]
)
在Firefox中打开生成的doc.pdf
:
英文:
You can download the document images directly with requests
and save to PDF with PIL
. For example:
import requests
from PIL import Image # pip install Pillow
from io import BytesIO
pdf_path = "doc.pdf"
url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'
images = [
Image.open(BytesIO(requests.get(url.format(f'{p:>04}'), verify=False).content))
for p in range(1, 4) # <-- increase number of pages here (now it will save first 3 pages)
]
# borrowing from this answer: https://stackoverflow.com/a/47283224/10035985
images[0].save(
pdf_path, "PDF" ,resolution=100.0, save_all=True, append_images=images[1:]
)
The resulting doc.pdf
opened in Firefox:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论