3

This seems like something that should be quick to do, but in practice there seems to be a problem. I have a bunch of PDF forms that include form fields and embedded javascript. I would like to remove the javascript code safely, but leave the PDF form fields intact.

So far I've been able to find lots of solutions, but all the solutions have either eliminated both the javascript and the form fields, or left both intact.

Here's solution A; it copies both form fields and javascript:

var pdfReader = new PdfReader(infilename);
using (MemoryStream memoryStream = new MemoryStream()) {
    PdfCopyFields copy = new PdfCopyFields(memoryStream);
    copy.AddDocument(pdfReader);
    copy.Close();
    File.WriteAllBytes(rawfilename, memoryStream.ToArray());
}

Alternately, I have solution B, that strips out both form fields and javascript:

Document document = new Document();
using (MemoryStream memoryStream = new MemoryStream()) {
    PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
    document.Open();
    document.AddDocListener(writer);
    for (int p = 1; p <= pdfReader.NumberOfPages; p++) {
        document.SetPageSize(pdfReader.GetPageSize(p));
        document.NewPage();
        PdfContentByte cb = writer.DirectContent;
        PdfImportedPage pageImport = writer.GetImportedPage(pdfReader, p);
        int rot = pdfReader.GetPageRotation(p);
        if (rot == 90 || rot == 270) {
            cb.AddTemplate(pageImport, 0, -1.0F, 1.0F, 0, 0, pdfReader.GetPageSizeWithRotation(p).Height);
        } else {
            cb.AddTemplate(pageImport, 1.0F, 0, 0, 1.0F, 0, 0);
        }
    }
    document.Close();
    File.WriteAllBytes(rawfile, memoryStream.ToArray());
}

Does anyone know how to modify either solution A or B to eliminate the javascript but leave the form fields in place?

EDIT: Solution code is here!

using (MemoryStream memoryStream = new MemoryStream()) {
    PdfStamper stamper = new PdfStamper(pdfReader, memoryStream);
    for (int i = 0; i <= pdfReader.XrefSize; i++) {
        object o = pdfReader.GetPdfObject(i);
        PdfDictionary pd = o as PdfDictionary;
        if (pd != null) {
            pd.Remove(PdfName.AA);
            pd.Remove(PdfName.JS);
            pd.Remove(PdfName.JAVASCRIPT);
        }
    }
    stamper.Close();
    pdfReader.Close();
    File.WriteAllBytes(rawfile, memoryStream.ToArray());
}

2 Answers 2

4

To manipulate a single PDF you should use the class PdfStamper and manipulate its contents, in your case iterating over the existing form fields and removing the JavaScript entries.

The iTextSharp sample AddJavaScriptToForm.cs corresponding to AddJavaScriptToForm.java from chapter 13 of iText in Action — 2nd Edition shows how JavaScript actions are added to fields, the central code being:

PdfStamper stamper = new PdfStamper(reader, ms);

AcroFields form = stamper.AcroFields;
AcroFields.Item fd = form.GetFieldItem("married");

PdfDictionary dictYes = (PdfDictionary) PdfReader.GetPdfObject(fd.GetWidgetRef(0));
PdfDictionary yesAction = ...;
dictYes.Put(PdfName.AA, yesAction);

Thus, to remove such JavaScript form field actions you have to iterate over all those PDF form fields and remove the /AA values in the associated dictionaries:

dictXXX.Remove(PdfName.AA);

EDIT: (provided by Ted Spence) Here is the final code that successfully removes javascript while leaving all form fields intact:

using (MemoryStream memoryStream = new MemoryStream())
{
    PdfStamper stamper = new PdfStamper(pdfReader, memoryStream);
    for (int i = 0; i <= pdfReader.XrefSize; i++)
    {
        PdfDictionary pd = pdfReader.GetPdfObject(i) as PdfDictionary;
        if (pd != null)
        {
            pd.Remove(PdfName.AA); // Removes automatic execution objects
            pd.Remove(PdfName.JS); // Removes javascript objects
            pd.Remove(PdfName.JAVASCRIPT); // Removes other javascript objects
        }
    }
    stamper.Close();
    pdfReader.Close();
    File.WriteAllBytes(rawfile, memoryStream.ToArray());
}

EDIT: (by mkl) The solution above is somewhat overachieving because it touches each and every indirect dictionary object. On the other hand it ignores inline dictionaries (I haven't checked the spec, though; maybe all /AA, /JS, and /JAVASCRIPT entries appear only in dictionaries which have to be indirect objects, or at least are de-referenced by this code).

If fulfilling this task was my job, I would try and access the objects possibly carrying JavaScript more specifically.

The advantage of this overachieving procedure might be, though, that even PDF objects are inspected which currently are not specified as carrying JavaScript but will be in later PDF versions.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! Let me give this a try and see how it does.
Success! This did the job. Let me modify your answer to put in exactly how the final code reads.
@TedSpence When I saw your edit proposal, it had already been rejected as incorrect or an attempt to reply to or comment on the existing post. I included it here but have to add that your solution is overachieving because it touches each and every indirect dictionary object. On the other hand it ignores inline dictionaries.
There's definitely lots to learn about the PDF spec. I'd love to create a more robust solution, so if there are ways to improve it I'd love to see them.
-1

Add the following lines after the for loop to keep the AcroForm:

var form = pdfReader.AcroForm;
if (form != null)
   writer.CopyAcroForm(reader);

2 Comments

The function "CopyAcroForm(reader)" doesn't seem to exist in my iTextSharp - I downloaded the latest version this week. Is this a function in an extension library, perhaps? EDIT - You may be thinking of PdfCopy.CopyAcroForm. I'll check that out.
Bad news - using PdfCopy.CopyAcroForm didn't work. It copied over all the javascript.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.