I am using meteor-react for uploading PDF docs to my Node.js backend, where I want to read the uploaded PDF doc, as a json, or whatever. Is it possible? And what library/tool would you recommended for that? Thank you!
1 Answer
There are a couple of Node packages for parsing PDF:
- pdf2json: https://www.npmjs.com/package/pdf2json
- pdfreader: https://www.npmjs.com/package/pdfreader
Check out their Github and documentation pages. It appears to me that pdf2json is a more complete solution, while pdfreader might be easier to get started with. You'll have to experiment and choose based on your project requirements.
9 Comments
peter
Im used the pdf2json package, and it was totaly easy to get out the pdf fields with value. Thanks for recommendation.
peter
The only problem, the pdf parser worked locally, but when we pushed to our test server, than got an error like: parserError: "An error occurred while parsing the PDF: InvalidPDFException" And Its really hard to find out what is the problem :/
Arash Motamedi
That's unfortunate. I suspect the library is depending on an external library being installed on the machine. A couple ideas come to my mind for narrowing down the problem: 1. Can you create a VM on your local machine with similar specs as your server (mostly OS) and try running your code there. 2. Can you prototype a quick sample app using the other library, and see if that one works when deployed to your server?
Arash Motamedi
And finally, if none of those work, can you please give us the complete error message (including stack trace) maybe there's some hint there that we can track in the library's source code.
ibash
pdf2json is tragically bad code. You should avoid it. Unfortunately pdfreader depends on pdf2json too. Skip all that nonsense and use Mozilla's pdf.js
|