How can I remove the first page of multiple pdf files in a directory? PYTHON

Question

I need to remove the first page of multiple pdf files in a directory. I am an elementary level python user and I have cobbled together the following code from bits & pieces of other code that I have. However, I cannot get it to work. Does anything jump out at anyone?

from PyPDF2 import PdfFileWriter, PdfFileReader

import os, sys

directory_name = 'emma'


for filename in directory_name:
    print 'name: %s' % filename

    output_file = PdfFileWriter()
    input_handle = open(filename+'.pdf', 'rb')
    input_file = PdfFileReader(input_handle)

    num_pages = input_file.getNumPages()

    print "document has %s pages \n" % num_pages

    for i in xrange(1, num_pages):
        output_file.addPage(input_file.getPage(i))
        print 'added page %s \n' % i

    output_stream = file(filename+'-stripped.pdf','wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Error message:

    input_handle = open(filename+'.pdf', 'rb')
        IOError: [Errno 2] No such file or directory: 'a.pdf'

First of all, please specify the meaning of "cannot get it to work". Second, assuming the answer to the 1st question is "the resulting document is created but incomplete", examine the internals of reader and writer objects (perhaps, there's an underlying "document" object) to see what is missing in the 2nd one. I guess it's additional entities besides pages. — ivan_pozdeev
– ivan_pozdeev, Commented Jun 29, 2013 at 13:56
Well, I am getting an error which is: input_handle = open(filename+' .pdf', 'rb'> IOError: [Errno 2] No such file or directory: 'a.pdf' — Em Clar
– Em Clar, Commented Jun 29, 2013 at 14:24
Then it's exactly what it reads: the OS cannot find the file path you passed to the open() call. It's not even connected to PyPDF2. Please do a reasonable amount of preliminary diagnostics and/or googling yourself before asking questions on the Net and making others waste their time on them. — ivan_pozdeev
– ivan_pozdeev, Commented Jun 29, 2013 at 15:06

Thomas Fenzl · Accepted Answer · 2013-06-29 15:29:44Z

1

Your code iterates over "emma" and tries to open e.pdf, m.pdf (twice), a.pdf. Your error on a.pdf means the first two actually exist, which is interesting enough on its own.

But to your problem, you need to use os.listdir or glob to actually get the filenames within the directory.

edited Jun 29, 2013 at 15:29

answered Jun 29, 2013 at 15:22

Thomas Fenzl

4,4221 gold badge19 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Philipp Stark · Accepted Answer · 2023-07-19 12:29:10Z

I adapted the code to Python 3, just in case somebody wants to use it:

from PyPDF2 import PdfWriter, PdfReader 

import os, glob, sys

os.chdir(r'data_path')
filename_lst = glob.glob('*.pdf')
print('number of files: {}'.format(len(filename_lst)))

save_path = '...' # if you want to save the results somewhere else

for filename in filename_lst:
    print('name: {}'.format(filename))

    output_file = PdfWriter()
    input_handle = open(filename, 'rb')
    input_file = PdfReader (input_handle)

    num_pages = len(input_file.pages)

    print("document has {} pages \n".format(num_pages))

    for i in range(1, num_pages):
        output_file.add_page(input_file.pages[i])

    output_stream = open(save_path + filename, 'wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Collectives™ on Stack Overflow

How can I remove the first page of multiple pdf files in a directory? PYTHON

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related