如何将PDF文件汇总为纯文本,并创建并放置新文件在桌面上?

huangapple go评论97阅读模式
英文:

How to summarize pdf file into plain text, and create and place new file on desktop?

问题

我想自动将PDF文件转换成文本,然后将输出保存到我的桌面上。

示例:

-- PDF转换的文本: "HELLO WORLD"

-- 在桌面上保存一个.txt文件,其中包含"HELLO WORLD"。

我已经完成了:

  1. fp = open('/Users/zain/Desktop', 'pdf_summary')
  2. fp.write(text)

完整的代码:

  1. from PyPDF2 import PdfReader
  2. reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
  3. text = ""
  4. for page in reader.pages:
  5. text += page.extract_text() + "\n"
  6. print(text)
  7. fp = open('/Users/zain/Desktop', 'pdf_summary')
  8. fp.write(text)
  9. fp.write(text)
英文:

I want to automatically turn pdf files into text, and then take that output to save a file on my desktop.

Example:

-- pdf converted text: "HELLO WORLD"

-- save file on desktop on a .txt file with "HELLO WORLD" saved.

I have done:

  1. fp = open('/Users/zain/Desktop', 'pdf_summary')
  2. fp.write(text)

I thought this would save my file on the desktop given the input (text) which I used as the variable to house the converted text.

Full Code:

  1. from PyPDF2 import PdfReader
  2. reader = PdfReader("/Users/zain/Desktop/Week2_POL305_Manfieldetal.pdf")
  3. text = ""
  4. for page in reader.pages:
  5. text += page.extract_text() + "\n"
  6. print(text)
  7. fp = open('/Users/zain/Desktop', 'pdf_summary')
  8. fp.write(text)
  9. fp.write(text)

答案1

得分: 0

这对我有用。

  1. from PyPDF2 import PdfReader
  2. # PDF文件的路径
  3. reader = PdfReader(r'C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf')
  4. text = ""
  5. for page in reader.pages:
  6. text += page.extract_text() + '\n'
  7. # 在桌面上保存文件的路径
  8. # 你可以保留txt,不修改任何内容,或将其更改为其他文件类型
  9. fp = open(r'C:\Users\zain\Desktop\pdf_summary.txt','a')
  10. fp.writelines(text)
英文:

This works for me.

  1. from PyPDF2 import PdfReader
  2. #path to pdf file
  3. reader=PdfReader(r'C:\Users\zain\Desktop\Week2_POL305_Manfieldetal.pdf')
  4. text = ""
  5. for page in reader.pages:
  6. text += page.extract_text() + '\n'
  7. #path to save file on desktop
  8. #you can keep txt, leave nothing, or change it to another file type
  9. fp = open(r'C:\Users\zain\Desktop\pdf_summary.txt','a')
  10. fp.writelines(text)

答案2

得分: 0

PDF文件可能包含各种内容,不仅仅是文本。如果你需要提取文本,就必须明确地从PDF中提取它。

在PyMuPDF包中,你可以这样做:

  1. import fitz # 导入PyMuPDF
  2. import pathlib
  3. doc = fitz.open("input.pdf")
  4. text = "\n".join([page.get_text() for page in doc])
  5. pathlib.Path("input.txt").write_bytes(text.encode()) # 支持非ASCII文本
英文:

PDF may consist of all sorts of things, not only text.
You therefore have to explicitly extract text from a PDF - if that is what you want.

In package PyMuPDF you could do it this way:

  1. import fitz # import pymupdf
  2. import pathlib
  3. doc=fitz.open("input.pdf")
  4. text = "\n".join([page.get_text() for page in doc])
  5. pathlib.Path("input.txt").write_bytes(text.encode()) # supports non ASCII text

huangapple
  • 本文由 发表于 2023年2月16日 04:09:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464984.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定