I'm not having any luck with pyPDF2 or PDFMiner. The tools always return _______________ for the textboxes even if they are filled in. Does anyone have any idea on how to extract the text within the textbox fields?
-
What did you try with pyPDF2/PDFMiner? what did it return?Jesse– Jesse2018-05-25 01:14:37 +00:00Commented May 25, 2018 at 1:14
-
stackoverflow.com/questions/15583535/…, stackoverflow.com/questions/34129936/…, stackoverflow.com/questions/26494211/…Jesse– Jesse2018-05-25 01:15:26 +00:00Commented May 25, 2018 at 1:15
Add a comment
|
1 Answer
You need to extract text fields, not a text. So you need something like this:
import sys
import six
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1
fp = open("c:\\tmp\\test.pdf", "rb")
parser = PDFParser(fp)
doc = PDFDocument(parser)
fields = resolve1(doc.catalog["AcroForm"])["Fields"]
for i in fields:
field = resolve1(i)
name, value = field.get("T"), field.get("V")
print ("{0}:{1}".format(name,value))
1 Comment
Farhang Amaji
it didnt work for me and made
KeyError: 'AcroForm' error.