如何读取已填写的PDF表单中的数据和相应的字段名称

huangapple go评论56阅读模式
英文:

How to read the data and the associated field name that is in a filled-in PDF form

问题

我正在编写一个Python脚本,需要从PDF表单中提取数据,作为较大脚本的一部分。我尝试使用pyPDF3,虽然它可以显示表单中的字符串,但无法显示填写的数据。我有一个表单,在其中我已经在字段中输入了值'XXX',我希望脚本能够返回该数据和字段的名称,但我似乎无法读取数据。fillpdfs模块非常有用,但据我所知,它可以返回字段名称,但无法返回数据。
我有这段代码:

from PyPDF3 import PdfFileWriter, PdfFileReader
# 打开PDF文件
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)

# 从每一页提取文本数据
for page_num in range(pdf_reader.numPages):
    page = pdf_reader.getPage(page_num)
    'XXX' in page.extractText()
英文:

I am writing a python script that needs to pull the data filled in a PDF form as part of a larger script. I tried using pyPDF3 but while it can show me the strings in the form, it does not show the filled-in data. I have a form where I have entered the value 'XXX" into a field and I want the script to be able to return that data and the name of the field but I can't seem to read the data. The fillpdfs module is very helpful but AFAICT it can return the field names but not the data.
I have this snippet:

    from PyPDF3 import PdfFileWriter, PdfFileReader
    # Open the PDF file
    pdf_file = open('filename.pdf', 'rb')
    pdf_reader = PdfFileReader(pdf_file)

   # Extract text data from each page
   for page_num in range(pdf_reader.numPages):
    page = pdf_reader.getPage(page_num)
    'XXX' in page.extractText()

答案1

得分: 0

以下是要翻译的内容:

dictionary = pdf_reader.getFormTextFields() # 返回一个Python字典
print(dictionary)

文档

英文:

There is a function for pdf forms:

dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
print(dictionary)

Documentation

huangapple
  • 本文由 发表于 2023年2月9日 00:18:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388705.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定