2

I have a PDF file which I want to read into memory using NodeJS. Ideally I'd like to encode it using base64 for transferring it. But somehow the read function does not seem to read the full PDF file, which makes no sense to me. The original PDF was generated using pdfKit, and is ok and viewable using a PDF reader program.

The original file test.pdf has 90kB on disk. But if I read and write it back to disk there are just 82kB and the new PDF test-out.pdf is not ok. The pdf viewer says:

Unable to open document. The pdf document is damaged.

The base64 encoding therefore also does not work correctly. I tested it using this webservice. Does someone know why and what is happening here? And how to resolve it.

I found this post already.

fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // returns raw buffer binary data
// buf = fs.readFileSync('test.pdf', {encoding:'base64'}); // for the base64 encoded data
// ...transfer the base64 data...
fs.writeFileSync('test-out.pdf', buf); // should be pdf again

EDIT MCVE:

const fs = require('fs');
const PDFDocument = require('pdfkit');

let filepath = 'output.pdf';

class PDF {
  constructor() {
    this.doc = new PDFDocument();
    this.setupdocument();
    this.doc.pipe(fs.createWriteStream(filepath));
  }

  setupdocument() {
    var pageNumber = 1;
    this.doc.on('pageAdded', () => {
        this.doc.text(++pageNumber, 0.5 * (this.doc.page.width - 100), 40, {width: 100, align: 'center'});
      }
    );

    this.doc.moveDown();
    // draw some headline text
    this.doc.fontSize(25).text('Some Headline');
    this.doc.fontSize(15).text('Generated: ' + new Date().toUTCString());
    this.doc.moveDown();
    this.doc.font('Times-Roman', 11);
  }

  report(object) {

    this.doc.moveDown();
    this.doc
      .text(object.location+' '+object.table+' '+Date.now())
      .font('Times-Roman', 11)
      .moveDown()
      .text(object.name)
      .font('Times-Roman', 11);

    this.doc.end();
    let report = fs.readFileSync(filepath);
    return report;
  }
}

let pdf = new PDF();
let buf = pdf.report({location: 'athome', table:'wood', name:'Bob'});
fs.writeFileSync('outfile1.pdf', buf);
0

2 Answers 2

1

The encoding option for fs.readFileSync() is for you to tell the readFile function what encoding the file already is so the code reading the file knows how to interpret the data it reads. It does not convert it into that encoding.

In this case, your PDF is binary - it's not base64 so you are telling it to try to convert it from base64 into binary which causes it to mess up the data.

You should not be passing the encoding option at all and you will then get the RAW binary buffer (which is what a PDF file is - raw binary). If you then want to convert that to base64 for some reason, you can then do buf.toString('base64') on it. But, that is not its native format and if you write that converted data back out to disk, it won't be a legal PDF file.

To just read and write the same file out to a different filename, leave off the encoding option entirely:

const fs = require('fs');
let buf = fs.readFileSync('test.pdf'); // get raw buffer binary data
fs.writeFileSync('test-out.pdf', buf); // write out raw buffer binary data
Sign up to request clarification or add additional context in comments.

3 Comments

The default encoding is 'utf8' in read/write operations so do we need to use 'binary' when working with binary files like pdf, image etc?
@Viney - Per the fs.readFileSync() documentation, the default is null which means to just leave it as whatever it is - don't try to interpret it which is what you want for binary.
Thanks for the confirmation and explaination. The main problem is that readFileSync somehow doesn't read the (whole) file. In the MCVE I just added to my question nothing is read, and in my real document there are only the same 82kB read, regardless of how big by file actually is (without error). Do you know why? It works as expected with other pdfs (eg MS Word generated ones) but not with the pdfkit ones. But as stated in the question, the generated file from pdfkit output.pdf can be read just like normal and seems to be completely ok. output1.pdf fails.
1

After a lot of searching I found this Github issue. The problem in my question seems to be the call of doc.end() which for some reason doesn't wait for the stream to finish (finish event of write stream). Therefore as suggested in the Github issue, the following approaches work:

  • callback based:
doc = new PDFDocument();
writeStream = fs.createWriteStream('filename.pdf');
doc.pipe(writeStream);
doc.end()
writeStream.on('finish', function () {
    // do stuff with the PDF file
});
  • or promise based:
const stream = fs.createWriteStream(localFilePath);
doc.pipe(stream);
.....
doc.end();
await new Promise<void>(resolve => {
  stream.on("finish", function() {
    resolve();
  });
});
  • or even nicer, instead of calling doc.end() direcly, call the function savePdfToFile below:
function savePdfToFile(pdf : PDFKit.PDFDocument, fileName : string) : Promise<void> {
  return new Promise<void>((resolve, reject) => {

    //  To determine when the PDF has finished being written sucessfully 
    //  we need to confirm the following 2 conditions:
    //
    //  1. The write stream has been closed
    //  2. PDFDocument.end() was called syncronously without an error being thrown

    let pendingStepCount = 2;

    const stepFinished = () => {
      if (--pendingStepCount == 0) {
        resolve();
      }
    };

    const writeStream = fs.createWriteStream(fileName);
    writeStream.on('close', stepFinished);
    pdf.pipe(writeStream);

    pdf.end();

    stepFinished();
  }); 
}

This function should correctly handle the following situations:

  • PDF generated successfully
  • Error is thrown inside pdf.end() before write stream is closed
  • Error is thrown inside pdf.end() after write stream has been closed

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.