38

I have some .pdf files with more than 500 pages, but I need only a few pages in each file. It is necessary to preserve document`s title pages. I know exactly the numbers of the pages that program should remove. How I can do it using Python 2.7 Environment, which is installed upon MS Visual Studio?

0

2 Answers 2

68

Try using PyPDF2.

Instead of deleting pages, create a new document and add all pages which you don't want to delete.

Some sample code (originally adapted from BinPress which is dead, archived here).

from PyPDF2 import PdfWriter, PdfReader
pages_to_keep = [1, 2, 10] # page numbering starts from 0
infile = PdfReader('source.pdf', 'rb')
output = PdfWriter()

for i in pages_to_keep:
    p = infile.pages[i] 
    output.add_page(p)

with open('newfile.pdf', 'wb') as f:
    output.write(f)

or

from PyPDF2 import PdfWriter, PdfReader
pages_to_delete = [3, 4, 5] # page numbering starts from 0
infile = PdfReader('source.pdf', 'rb')
output = PdfWriter()

for i in range(len(infile.pages)):
    if i not in pages_to_delete:
        p = infile.pages[i]
        output.add_page(p)

with open('newfile.pdf', 'wb') as f:
    output.write(f)
Sign up to request clarification or add additional context in comments.

4 Comments

or just for i in pages_to_keep: ...?
@Dennis: Thanks for noticing! Updated the answer, links and added another sentence of explanation.
@MaximilianPeters i tried this, and output pdf file size was slightly greater than the input file size by about 0.3MB for 500pages book. What might have been the reason for the increase in file size, I am just curious.
This worked, now I don't have to pay Adobe to remove a trailing blank page from Resume.pdf . Thank you!
5

Today in 2023, the other method to accomplish this is to use the PyMuPDF library. On Windows 11, you can install this from command prompt like so

pip install PyMuPDF

Once installed, you can use it as such:

# Import library
import fitz

# Open the PDF file
doc=fitz.open("in_file.pdf")
    
# Say, you like to save the first 6 pages, first page is 0
doc.select([0,1,2,3,4,5])
   
# Save the selected pages to a new PDF
doc.save("out_file_name.pdf")

1 Comment

works great, but how come the resultant .pdf file is same size as original even though i'm only keeping 10% of the pages? is there some way to reduce the total final pdf size to a smaller value?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.