2023年6月19日 15:55:30go评论108阅读模式

英文:

How to make a PDF from an online ebook that is displayed page by page?

问题

我想将像这样的书保存为PDF文件 https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html，它按页显示书的内容。

如何做到这一点？

到目前为止，我唯一成功的事情是逐页打印成PDF，然后将单独的PDF页合并。

有没有办法在Python或其他脚本中自动完成这个操作？

英文:

I would like to save into PDF books like this one to PDF https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/index.html that shows a book page by page.

How to do it?

The only thing that I managed so far is to print page by page into a pdf, and then combine separate pdf pages.

Is there a way to do it automatically in Python or other scripts?

答案1

得分: 1

你可以使用requests直接下载文档图片，并使用PIL保存为PDF。例如：

import requests
from PIL import Image  # pip install Pillow
from io import BytesIO
pdf_path = "doc.pdf"
url = 'https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg'
images = [
    Image.open(BytesIO(requests.get(url.format(f'{p:04}'), verify=False).content))
    for p in range(1, 4)  # <— 在这里增加页面数（现在将保存前3页）
]
# 借鉴自此答案：https://stackoverflow.com/a/47283224/10035985
images[0].save(
    pdf_path, "PDF", resolution=100.0, save_all=True, append_images=images[1:]
)

在Firefox中打开生成的doc.pdf：

英文:

You can download the document images directly with requests and save to PDF with PIL. For example:

import requests
from PIL import Image # pip install Pillow
from io import BytesIO
pdf_path = &quot;doc.pdf&quot;
url = &#39;https://kcenter.korean.go.kr/repository/ebook/culture/SB_step3/assets/page-images/page-113088-{}.jpg&#39;
images = [
    Image.open(BytesIO(requests.get(url.format(f&#39;{p:&gt;04}&#39;), verify=False).content))
    for p in range(1, 4)  # &lt;-- increase number of pages here (now it will save first 3 pages)
]
# borrowing from this answer: https://stackoverflow.com/a/47283224/10035985
images[0].save(
    pdf_path, &quot;PDF&quot; ,resolution=100.0, save_all=True, append_images=images[1:]
)

The resulting doc.pdf opened in Firefox:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从以页面形式显示的在线电子书制作PDF？

问题

答案1

无法找到导入的数据框中的键，尽管该键存在。

与Python Selenium交互React图表

Failed building wheel for mysqlclient on macOS.

如何在Python列表中删除括号内的文本？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。