英文:
How to read the data and the associated field name that is in a filled-in PDF form
问题
我正在编写一个Python脚本,需要从PDF表单中提取数据,作为较大脚本的一部分。我尝试使用pyPDF3,虽然它可以显示表单中的字符串,但无法显示填写的数据。我有一个表单,在其中我已经在字段中输入了值'XXX',我希望脚本能够返回该数据和字段的名称,但我似乎无法读取数据。fillpdfs模块非常有用,但据我所知,它可以返回字段名称,但无法返回数据。
我有这段代码:
from PyPDF3 import PdfFileWriter, PdfFileReader
# 打开PDF文件
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)
# 从每一页提取文本数据
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
'XXX' in page.extractText()
英文:
I am writing a python script that needs to pull the data filled in a PDF form as part of a larger script. I tried using pyPDF3 but while it can show me the strings in the form, it does not show the filled-in data. I have a form where I have entered the value 'XXX" into a field and I want the script to be able to return that data and the name of the field but I can't seem to read the data. The fillpdfs module is very helpful but AFAICT it can return the field names but not the data.
I have this snippet:
from PyPDF3 import PdfFileWriter, PdfFileReader
# Open the PDF file
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)
# Extract text data from each page
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
'XXX' in page.extractText()
答案1
得分: 0
以下是要翻译的内容:
dictionary = pdf_reader.getFormTextFields() # 返回一个Python字典
print(dictionary)
英文:
There is a function for pdf forms:
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
print(dictionary)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论