How to "write to variable" instead of "to file" in Python

Question

I'm trying to write a function which splits a pdf into separate pages. From this SO answer. I copied a simple function which splits a pdf into separate pages:

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            output.write(outputStream)
    return pages

This however, writes the new PDFs to file, instead of returning a list of the new PDFs as file variables. So I changed the line of output.write(outputStream) to:

pages.append(outputStream)

When trying to write the elements in the pages list however, I get a ValueError: I/O operation on closed file.

Does anybody know how I can add the new files to the list and return them, instead of writing them to file? All tips are welcome!

Have you tried reading the data, rather than storing the file handle - pages.append(outputStream.read())? — jonrsharpe
– jonrsharpe, Commented Oct 23, 2014 at 13:32
Have you tried using cStringIO.StringIO to open outputStream? — user4815162342
– user4815162342, Commented Oct 23, 2014 at 13:37
what the user above said... you can usually substitute a StringIO object for a file and get the result out as a string that way — Anentropic
– Anentropic, Commented Oct 23, 2014 at 13:40
@jonrsharpe - I just tried it, and that gives me a IOError: File not open for reading on the line saying pages.append(outputStream.read()). Any other ideas? — kramer65
– kramer65, Commented Oct 23, 2014 at 13:40
@user4815162342 - Ehm, no I haven't tried StringIO. Any tips on how to do that? A code example would be very welcome.. :) — kramer65
– kramer65, Commented Oct 23, 2014 at 13:41

parchment · Accepted Answer · 2014-10-23 14:07:26Z

7

You can use the in-memory binary streams in the io module. This will store the pdf files in your memory.

import io

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        outputStream = io.BytesIO()

        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        output.write(outputStream)

        # Move the stream position to the beginning,
        # making it easier for other code to read
        outputStream.seek(0)

        pages.append(outputStream)
    return pages

To later write the objects to a file, use shutil.copyfileobj:

import shutil

with open('page0.pdf', 'wb') as out:
    shutil.copyfileobj(pages[0], out)

answered Oct 23, 2014 at 14:07

parchment

4,0121 gold badge22 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user4815162342 · Accepted Answer · 2014-10-23 18:53:40Z

6

It is not completely clear what you mean by "list of PDFs as file variables. If you want to create strings instead of files with PDF contents, and return a list of such strings, replace open() with StringIO and call getvalue() to obtain the contents:

import cStringIO

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        io = cStringIO.StringIO()
        output.write(io)
        pages.append(io.getvalue())
    return pages

edited Oct 23, 2014 at 18:53

answered Oct 23, 2014 at 14:36

user4815162342

159k22 gold badges350 silver badges418 bronze badges

2 Comments

Garrett Over a year ago

(This answer is Python 2 only)

user4815162342 Over a year ago

@Garrett It should be quite straightforward to adapt to Python 3, though.

Werner · Accepted Answer · 2014-10-23 14:24:27Z

1

Haven't used PdfFileWriter, but think that this should work.

def splitPdf(file_):
    pdf = PdfFileReader(file_)
    pages = []
    for i in range(pdf.getNumPages()):
        output = PdfFileWriter()
        output.addPage(pdf.getPage(i))
        pages.append(output)
    return pages

def writePdf(pages):
    i = 1
    for p in pages:
        with open("document-page%s.pdf" % i, "wb") as outputStream:
            p.write(outputStream)
        i += 1

answered Oct 23, 2014 at 14:24

Werner

2,0961 gold badge15 silver badges14 bronze badges

Collectives™ on Stack Overflow

How to "write to variable" instead of "to file" in Python

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related