2023年2月16日 04:09:55go评论97阅读模式

英文:

How to summarize pdf file into plain text, and create and place new file on desktop?

问题

我想自动将PDF文件转换成文本，然后将输出保存到我的桌面上。

示例：

-- PDF转换的文本: "HELLO WORLD"

-- 在桌面上保存一个.txt文件，其中包含"HELLO WORLD"。

我已经完成了：

fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)

完整的代码：

from PyPDF2 import PdfReader
reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
print(text)
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
fp.write(text)

英文:

I want to automatically turn pdf files into text, and then take that output to save a file on my desktop.

Example:

-- pdf converted text: "HELLO WORLD"

-- save file on desktop on a .txt file with "HELLO WORLD" saved.

I have done:

fp = open(&#39;/Users/zain/Desktop&#39;, &#39;pdf_summary&#39;)
fp.write(text)

I thought this would save my file on the desktop given the input (text) which I used as the variable to house the converted text.

Full Code:

from PyPDF2 import PdfReader
reader = PdfReader(&quot;/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf&quot;)
text = &quot;&quot;
for page in reader.pages:
text += page.extract_text() + &quot;\n&quot;
print(text)
fp = open(&#39;/Users/zain/Desktop&#39;, &#39;pdf_summary&#39;)
fp.write(text)
fp.write(text)

答案1

得分: 0

这对我有用。

from PyPDF2 import PdfReader
# PDF文件的路径
reader = PdfReader(r'C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf')
text = ""
for page in reader.pages:
    text += page.extract_text() + '\n'
# 在桌面上保存文件的路径
# 你可以保留txt，不修改任何内容，或将其更改为其他文件类型
fp = open(r'C:\Users\zain\Desktop\pdf_summary.txt','a')
fp.writelines(text)

英文:

This works for me.

from PyPDF2 import PdfReader
#path to pdf file
reader=PdfReader(r&#39;C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf&#39;)
text = &quot;&quot;
for page in reader.pages:
    text += page.extract_text() + &#39;\n&#39;
#path to save file on desktop
#you can keep txt, leave nothing, or change it to another file type
fp = open(r&#39;C:\Users\zain\Desktop\pdf_summary.txt&#39;,&#39;a&#39;)
fp.writelines(text)

答案2

得分: 0

PDF文件可能包含各种内容，不仅仅是文本。如果你需要提取文本，就必须明确地从PDF中提取它。

在PyMuPDF包中，你可以这样做：

import fitz  # 导入PyMuPDF
import pathlib
doc = fitz.open("input.pdf")
text = "\n".join([page.get_text() for page in doc])
pathlib.Path("input.txt").write_bytes(text.encode())  # 支持非ASCII文本

英文:

PDF may consist of all sorts of things, not only text.
You therefore have to explicitly extract text from a PDF - if that is what you want.

In package PyMuPDF you could do it this way:

import fitz  # import pymupdf
import pathlib
doc=fitz.open(&quot;input.pdf&quot;)
text = &quot;\n&quot;.join([page.get_text() for page in doc])
pathlib.Path(&quot;input.txt&quot;).write_bytes(text.encode())  # supports non ASCII text

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将PDF文件汇总为纯文本，并创建并放置新文件在桌面上？

问题

答案1

答案2

Is it possible to utilize a CLI (module/app/library) for use in my own Python script?

Passing raw binary image to Azure Cognitive Services API.

在一个gzip.GzipFile中向后寻找失败是否意味着它损坏了？

如何在数据框列的每6个字符串中拆分字符串？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。