英文:
How to summarize pdf file into plain text, and create and place new file on desktop?
问题
我想自动将PDF文件转换成文本,然后将输出保存到我的桌面上。
示例:
-- PDF转换的文本: "HELLO WORLD"
-- 在桌面上保存一个.txt文件,其中包含"HELLO WORLD"。
我已经完成了:
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
完整的代码:
from PyPDF2 import PdfReader
reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
print(text)
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
fp.write(text)
英文:
I want to automatically turn pdf files into text, and then take that output to save a file on my desktop.
Example:
-- pdf converted text: "HELLO WORLD"
-- save file on desktop on a .txt file with "HELLO WORLD" saved.
I have done:
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
I thought this would save my file on the desktop given the input (text) which I used as the variable to house the converted text.
Full Code:
from PyPDF2 import PdfReader
reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
text = ""
for page in reader.pages:
text += page.extract_text() + "\n"
print(text)
fp = open('/Users/zain/Desktop', 'pdf_summary')
fp.write(text)
fp.write(text)
答案1
得分: 0
这对我有用。
from PyPDF2 import PdfReader
# PDF文件的路径
reader = PdfReader(r'C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf')
text = ""
for page in reader.pages:
text += page.extract_text() + '\n'
# 在桌面上保存文件的路径
# 你可以保留txt,不修改任何内容,或将其更改为其他文件类型
fp = open(r'C:\Users\zain\Desktop\pdf_summary.txt','a')
fp.writelines(text)
英文:
This works for me.
from PyPDF2 import PdfReader
#path to pdf file
reader=PdfReader(r'C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf')
text = ""
for page in reader.pages:
text += page.extract_text() + '\n'
#path to save file on desktop
#you can keep txt, leave nothing, or change it to another file type
fp = open(r'C:\Users\zain\Desktop\pdf_summary.txt','a')
fp.writelines(text)
答案2
得分: 0
PDF文件可能包含各种内容,不仅仅是文本。如果你需要提取文本,就必须明确地从PDF中提取它。
在PyMuPDF包中,你可以这样做:
import fitz # 导入PyMuPDF
import pathlib
doc = fitz.open("input.pdf")
text = "\n".join([page.get_text() for page in doc])
pathlib.Path("input.txt").write_bytes(text.encode()) # 支持非ASCII文本
英文:
PDF may consist of all sorts of things, not only text.
You therefore have to explicitly extract text from a PDF - if that is what you want.
In package PyMuPDF you could do it this way:
import fitz # import pymupdf
import pathlib
doc=fitz.open("input.pdf")
text = "\n".join([page.get_text() for page in doc])
pathlib.Path("input.txt").write_bytes(text.encode()) # supports non ASCII text
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论