1

I have PDFs sent in from an external source that I want users to be able to view via a web service.

The PDFs are retrieved via a .NET Core service that gets them from the DB and outputs them as PDF files.

The problem is that malicious users can put JS in PDFs. Because they appear to the browser to some from the same origin the JS can execute XSS attacks on the rest of the application.

I don't need to retain any of the JS functionality, but I also want to keep as much of the PDFs as unchanged as possible.

Is there a way, using .NET Core, to strip JS out of PDFs and leave them otherwise unchanged?

Alternatively is there any way to specify not to execute any JS when opening PDF files embedded in webpages (for instance using <iframe src="file.pdf" or <object type="application/pdf" data="file.pdf"). I can't rely on users having additional PDF extensions, it would need to work with the vanilla browser.

10
  • You can use any proper general purpose PDF library to remove js. Commented Oct 24, 2017 at 4:21
  • 1
    @mkl cool, post a functional example of one that works with .NET Core in an answer and you can have some rep. Commented Oct 24, 2017 at 6:19
  • Which is the .NET Core PDF library of your choice? I cannot recommend one because A I've not yet dealt with .NET Core at all and so have no experiences which PDF libraries work properly there, and B library recommendations are off-topic here... Commented Oct 24, 2017 at 8:00
  • @mkl ah, so you think I'm asking for product recommendations? Hence the down/close votes? While there are plenty of PDF libraries out there most are not compatible with .NET Core and only a subset of those can make the changes I'm asking about. Even then that is non-trivial (exposing the internals of the PDF file format is a long way from knowing what can be removed). I'm not asking which way is best, I'm asking whether it's even practically possible, and if it's so simple that any of many possible libraries can do it then please feel free to provide a library-agnostic answer. Commented Oct 24, 2017 at 8:10
  • Both Bobrovsky's and Mihai Iancu's answers explain how to do address the task, but as you see both of them advertise specific libraries in the process. So yes, your question in addition to help on a specific task effectively is asking for a library recommendation which in turn most likely triggered the down votes and close votes. (Neither was from me, I hoped to nudge you to declare a PDF library of choice... ;) Commented Oct 24, 2017 at 11:17

2 Answers 2

1

To remove all the Javascript from PDF you could start from removing all shared JavaScript. This is a special document-level collection of scripts. It is often used to define JavaScript functions available for other scripts in the document.

Then you could find all actions in the document and check type of each of the actions. For Javascript actions you could replace associated code with an empty string.

This task is definitely not an easy one. I recommend you to use a PDF library for this.

My company develops Docotic.Pdf library that can be used in .NET Standard / .NET Core and can help with your task.

Sign up to request clarification or add additional context in comments.

Comments

0

The code below shows how to remove the JavaScript code from a PDF file using XFINIUM.PDF library:

public void RemoveDocumentJavascript(Stream inputStream, Stream outputStream)
{
    PdfFixedDocument doc = new PdfFixedDocument(inputStream);
    // Remove document level JS code
    doc.JavaScriptBlocks.Clear();

    RemoveDocumentActions(doc);

    // Remove JavaScript from annotations.
    for (int i = 0; i < doc.Pages.Count; i++)
    {
        for (int j = 0; j < doc.Pages[i].Annotations.Count; j++)
        {
            RemoveAnnotationActions(doc.Pages[i].Annotations[j]);
        }
    }

    // Remove Javascript from fields
    for (int i = 0; i < doc.Form.Fields.Count; i++)
    {
        RemoveFieldActions(doc.Form.Fields[i]);
    }

    doc.Save(outputStream);
}

private void RemoveDocumentActions(PdfFixedDocument doc)
{
    if (doc.OpenAction is PdfJavaScriptAction)
    {
        doc.OpenAction = null;
    }
    if (doc.BeforeCloseAction is PdfJavaScriptAction)
    {
        doc.BeforeCloseAction = null;
    }
    if (doc.BeforeSaveAction is PdfJavaScriptAction)
    {
        doc.BeforeSaveAction = null;
    }
    if (doc.AfterSaveAction is PdfJavaScriptAction)
    {
        doc.AfterSaveAction = null;
    }
    if (doc.BeforeSaveAction is PdfJavaScriptAction)
    {
        doc.BeforeSaveAction = null;
    }
    if (doc.AfterSaveAction is PdfJavaScriptAction)
    {
        doc.AfterSaveAction = null;
    }
    if (doc.BeforePrintAction is PdfJavaScriptAction)
    {
        doc.BeforePrintAction = null;
    }
    if (doc.AfterPrintAction is PdfJavaScriptAction)
    {
        doc.AfterPrintAction = null;
    }
}

private void RemoveAnnotationActions(PdfAnnotation annotation)
{
    if (annotation.PageOpen is PdfJavaScriptAction)
    {
        annotation.PageOpen = null;
    }
    if (annotation.PageClose is PdfJavaScriptAction)
    {
        annotation.PageClose = null;
    }
    if (annotation.PageVisible is PdfJavaScriptAction)
    {
        annotation.PageVisible = null;
    }
    if (annotation.PageInvisible is PdfJavaScriptAction)
    {
        annotation.PageInvisible = null;
    }
    if (annotation.MouseDown is PdfJavaScriptAction)
    {
        annotation.MouseDown = null;
    }
    if (annotation.MouseUp is PdfJavaScriptAction)
    {
        annotation.MouseUp = null;
    }
    if (annotation.MouseEnter is PdfJavaScriptAction)
    {
        annotation.MouseEnter = null;
    }
    if (annotation.MouseLeave is PdfJavaScriptAction)
    {
        annotation.MouseLeave = null;
    }
    PdfLinkAnnotation link = annotation as PdfLinkAnnotation;
    if ((link != null) && (link.Action is PdfJavaScriptAction))
    {
        link.Action = null;
    }
}

private void RemoveFieldActions(PdfField field)
{
    field.CalculateAction = null;
    field.FormatAction = null;
    field.KeyPressAction = null;
    field.ValidateAction = null;

    for (int i = 0; i < field.Widgets.Count; i++)
    {
        if (field.Widgets[i].Focus is PdfJavaScriptAction)
        {
            field.Widgets[i].Focus = null;
        }
        if (field.Widgets[i].Blur is PdfJavaScriptAction)
        {
            field.Widgets[i].Blur = null;
        }
    }
}

The library supports .NET Core and it is available on nuget.org (id: xfinium.pdf.netcore).
Unless you implement your own PDF parsing and saving code, you cannot implement this task without using a 3rd party library.

Disclaimer: I work for the company that develops XFINIUM.PDF library.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.