0

I have the following PDF https://www.dco.uscg.mil/Portals/9/NMC/pdfs/forms/CG_719B.pdf

I have tried a number of different ways to find access to the text boxes within code,

    async function fillAllFields() {
    const file = document.getElementById('pdf-upload').files[0];
    if (!file) return alert("Please upload a PDF");

    const arrayBuffer = await file.arrayBuffer();
    const pdfDoc = await PDFLib.PDFDocument.load(arrayBuffer);
    const form = pdfDoc.getForm();
    const fields = form.getFields();

    fields.forEach(field => {
      const name = field.getName();
      try {
        form.getTextField(name).setText(name);
        const widgets = field.acroField.dict.get('Kids') || [field.acroField];
        widgets.forEach(widget => {
          const rect = widget.get('Rect');
          if (rect) {
            const [x1, y1, x2, y2] = rect.map(n => n.number);
            console.log(`Field "${name}" at [${x1}, ${y1}, ${x2}, ${y2}]`);
          }
        });
      } catch (e) {
        console.log(`Skipping non-text field: ${name}`);
      }
    });

    const pdfBytes = await pdfDoc.save();
    const blob = new Blob([pdfBytes], { type: "application/pdf" });
    const link = document.createElement("a");
    link.href = URL.createObjectURL(blob);
    link.download = "filled_with_names.pdf";
    link.click();
  }

However this does not give me access to the text boxes, I have tried to change it up and add text above such as {{First_name}} in the hopes that I could access this piece of text and change it however when I use PDFPlumber to extract the text it does not return it

import pdfplumber

with pdfplumber.open("CG_719B_filled.pdf") as pdf:
    for page in pdf.pages:
        if page.page_number == 2:
            print(page.extract_text)
        print(page.extract_text())

So now I am checking for any kind of arcoform and it does not seem to have one.

import pdfplumber
from pdfplumber.utils.pdfinternals import resolve_and_decode, resolve

pdf = pdfplumber.open("CG_719B_filled.pdf")


def parse_field_helper(form_data, field, prefix=None):
    """appends any PDF AcroForm field/value pairs in `field` to provided `form_data` list

    if `field` has child fields, those will be parsed recursively.
    """
    resolved_field = field.resolve()
    field_name = ".".join(
        filter(lambda x: x, [prefix, resolve_and_decode(resolved_field.get("T"))])
    )
    if "Kids" in resolved_field:
        for kid_field in resolved_field["Kids"]:
            parse_field_helper(form_data, kid_field, prefix=field_name)
    if "T" in resolved_field or "TU" in resolved_field:
        # "T" is a field-name, but it's sometimes absent.
        # "TU" is the "alternate field name" and is often more human-readable
        # your PDF may have one, the other, or both.
        alternate_field_name = (
            resolve_and_decode(resolved_field.get("TU"))
            if resolved_field.get("TU")
            else None
        )
        field_value = (
            resolve_and_decode(resolved_field["V"]) if "V" in resolved_field else None
        )
        form_data.append([field_name, alternate_field_name, field_value])


form_data = []

# Check if the PDF has an AcroForm (interactive form fields)
if "AcroForm" in pdf.doc.catalog:
    acro_form = resolve(pdf.doc.catalog["AcroForm"])
    if "Fields" in acro_form:
        fields = resolve(acro_form["Fields"])
        for field in fields:
            parse_field_helper(form_data, field)
        print(form_data)
    else:
        print("PDF has AcroForm but no Fields")
else:
    print("PDF does not contain an AcroForm (no interactive form fields)")

pdf.close()

PDF does not contain an AcroForm (no interactive form fields) :(

Why did I think this was gonna be so easy to populate a PDF form, I'm at a loss of what path to take, I'm almost tempted to remake the total form in something that can be quickly filled with variables that can be replaced.

I would appreciate if someone could explain what exactly the issue is and how I could perhaps resolve it either convert this to an Acroform with fields and then a simple way to reference and add the data or a way to recreate this form that be filled in via code.

5
  • 1
    With regards to Python, pdfplumber is mainly for (reading) tables. pypdf is the "main" package for writing and form parsing. The code in github.com/py-pdf/pypdf/issues/2780 fills in all the fields in your PDF for me (excluding the checkboxes). Commented Oct 21 at 7:48
  • @jqurious that was similar to the code I was using with the plumber still gives me raise PyPdfError("No /AcroForm dictionary in PDF of PdfWriter Object") Commented Oct 21 at 15:37
  • Interesting indeed are you able to parse out the form data? Commented Oct 21 at 18:05
  • @jqurious with the PDF linked in the question? Commented Oct 21 at 18:36
  • Nice @jqurious are you on Windows or Mac? Commented Oct 21 at 20:58

1 Answer 1

1

I have used python with PyMuPDF to update and fill PDF forms.

The following code does several things:

  • It reads a PDF and iterates through the pages to identify form fields (widgets),

  • It print the properties of the widgets,

  • It updates the widget values if the field is in the dictionary (values) of known values.

I ran the script first to identify the field names to start building the dictionary. I added the update code and ran the script a second time to populate and save the PDF with an updated filename.

It looks like this PDF has some problems. Field names are not unique. Notably, the "Addicted" checkboxes field_names are duplicates.

The code has some lines commented out that may be useful for digging deeper into PDF file structure, analysis, and manipulation.

# use PyMuPDF to update PDF
import pymupdf 

def iscallable(obj, attr):
    try:
        retval = callable(getattr(obj, attr))
    except:
        # default to true to avoid call
        return True
    return retval

def printAttr(prefix, obj):
    attributes = [attr for attr in dir(obj) if not iscallable(obj, attr)]
    for attr in attributes:
        print(prefix, f"{attr}: {getattr(obj, attr)}")

values = {
    'form1[0].#subform[2].LastName[0]': 'Duck',
    'form1[0].#subform[2].MiddleName[0]': 'Fauntleroy ',
    'form1[0].#subform[2].FirstName[0]': 'Donald',
    'form1[0].#subform[2].Birthdate[0]': '05/03/1934',
    'form1[0].#subform[2].CityApplicant[0]': 'Burbank',
    'form1[0].#subform[2].StateApplicant[0]': 'CA',
    'form1[0].#subform[2].ZipApplicant[0]': '91521',
    'form1[0].#subform[2].StreetAddrApplicant[0]': '500 S Buena Vista St'
}

indir = './'
fname = 'CG_719B.pdf'

pdf = pymupdf.open(indir + fname)
print('***', fname)
print(dir(pdf))
printAttr('pdf', pdf)
for page in pdf:
    widgets = page.widgets()
    for widget in widgets:
        #print('dir(widget)', dir(widget))
        #printAttr('widget', widget)
        print('widget field_name:', widget.field_name)
        print('widget field_type_string:', widget.field_type_string)
        print('widget field_value:', widget.field_value)
        fieldname = widget.field_name
        # if fieldname in dictionary of values to enter, update
        if fieldname in values:
            widget.field_value = values[fieldname]
            widget.update()

        # print('dir(widget._annot)', dir(widget._annot))
        # printAttr('widget._annot.flags', widget._annot.flags)

pdf.save(fname.split('.')[0] + '_updated.pdf')
pdf.close()

Screen shot of filled PDF that was created by the code

Sign up to request clarification or add additional context in comments.

2 Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.