0

I need to remove the first page of multiple pdf files in a directory. I am an elementary level python user and I have cobbled together the following code from bits & pieces of other code that I have. However, I cannot get it to work. Does anything jump out at anyone?

from PyPDF2 import PdfFileWriter, PdfFileReader

import os, sys

directory_name = 'emma'


for filename in directory_name:
    print 'name: %s' % filename

    output_file = PdfFileWriter()
    input_handle = open(filename+'.pdf', 'rb')
    input_file = PdfFileReader(input_handle)

    num_pages = input_file.getNumPages()

    print "document has %s pages \n" % num_pages

    for i in xrange(1, num_pages):
        output_file.addPage(input_file.getPage(i))
        print 'added page %s \n' % i

    output_stream = file(filename+'-stripped.pdf','wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Error message:

    input_handle = open(filename+'.pdf', 'rb')
        IOError: [Errno 2] No such file or directory: 'a.pdf'
3
  • First of all, please specify the meaning of "cannot get it to work". Second, assuming the answer to the 1st question is "the resulting document is created but incomplete", examine the internals of reader and writer objects (perhaps, there's an underlying "document" object) to see what is missing in the 2nd one. I guess it's additional entities besides pages. Commented Jun 29, 2013 at 13:56
  • Well, I am getting an error which is: input_handle = open(filename+' .pdf', 'rb'> IOError: [Errno 2] No such file or directory: 'a.pdf' Commented Jun 29, 2013 at 14:24
  • Then it's exactly what it reads: the OS cannot find the file path you passed to the open() call. It's not even connected to PyPDF2. Please do a reasonable amount of preliminary diagnostics and/or googling yourself before asking questions on the Net and making others waste their time on them. Commented Jun 29, 2013 at 15:06

2 Answers 2

1

Your code iterates over "emma" and tries to open e.pdf, m.pdf (twice), a.pdf. Your error on a.pdf means the first two actually exist, which is interesting enough on its own.

But to your problem, you need to use os.listdir or glob to actually get the filenames within the directory.

Sign up to request clarification or add additional context in comments.

Comments

0

I adapted the code to Python 3, just in case somebody wants to use it:

from PyPDF2 import PdfWriter, PdfReader 

import os, glob, sys

os.chdir(r'data_path')
filename_lst = glob.glob('*.pdf')
print('number of files: {}'.format(len(filename_lst)))

save_path = '...' # if you want to save the results somewhere else

for filename in filename_lst:
    print('name: {}'.format(filename))

    output_file = PdfWriter()
    input_handle = open(filename, 'rb')
    input_file = PdfReader (input_handle)

    num_pages = len(input_file.pages)

    print("document has {} pages \n".format(num_pages))

    for i in range(1, num_pages):
        output_file.add_page(input_file.pages[i])

    output_stream = open(save_path + filename, 'wb')
    output_file.write(output_stream)

    output_stream.close()
    input_handle.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.