1

I am writing a small script that needs to merge many one-page pdf files. I want the script to run with Python3 and to have as few dependencies as possible.

For the PDF merging part, I tried using PyPdf. However, the Python 3 support seems to be buggy; It can't handle inkscape generated PDF files (which I need). I have the current git version of PyPdf installed, and the following test script doesn't work:

import PyPDF2

output_pdf = PyPDF2.PdfFileWriter()

with open("testI.pdf", "rb") as input:
    input_pdf = PyPDF2.PdfFileReader(input)
    output_pdf.addPage(input_pdf.getPage(0))

with open("test.pdf", "wb") as output:
    output_pdf.write(output)

It throws the following stack trace:

Traceback (most recent call last):
  File "test.py", line 7, in <module>
    output.addPage(input.getPage(0))
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 420, in getPage
    self._flatten()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 574, in _flatten
    self._flatten(page.getObject(), inherit)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 165, in getObject
    return self.pdf.getObject(self).getObject()
  File "/usr/lib/python3.3/site-packages/pyPdf/pdf.py", line 616, in getObject
    retval = readObject(self.stream, self)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 526, in readFromStream
    value = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 57, in readObject
    return ArrayObject.readFromStream(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 152, in readFromStream
    obj = readObject(stream, pdf)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 86, in readObject
    return NumberObject.readFromStream(stream)
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 231, in readFromStream
    return FloatObject(name.decode("ascii"))
  File "/usr/lib/python3.3/site-packages/pyPdf/generic.py", line 207, in __new__
    return decimal.Decimal.__new__(cls, str(value), context)
TypeError: optional argument must be a context

The same script, however, works flawlessly with Python 2.7.

What am I doing wrong here? Is it a bug in the library? Can I work around it without touching the PyPDF library?

2 Answers 2

3

So I found the answer. The decimal.Decimal module in Python3.3 shows some weird behaviour. This is the corresponding StackOverflow question: Instantiate Decimal class I added some workaround to the PyPDF2 library and submitted a pull request.

Sign up to request clarification or add additional context in comments.

5 Comments

Did it go through yet? I am suffering from the same problem.
I am sorry, I hit the wrong link. This is the correct (but still pending, god knows why) pull request to fix the bug: click. It's this repo.
Btw, since I have PyPDF2 installed, is there a way to just copy and paste some files?
This is the commit that fixes the error: click So just copy the green line and add it to the proper place in your file system. (Probably /usr/lib/python2.7/site-packages/PyPDF2/generic.py:213
Thank you so much! Even though I couldn't get the general thing to work for merging, I wrote a function that avoids the "PdfMerger".
2

Just to make sure you are aware of already existing tools that do exactly this:

  • PDFtk
  • PDFjam (my favourite, requires LaTeX though)
  • Directly with GhostScript:
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf file1.pdf file2.pdf

2 Comments

hi, thank you. But I find that PyPDF is a nice library and would like to use it. But thank you for the suggestions.
Just wanted to make sure you're not re-inventing the wheel when you don't actually mean to ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.