I have some .pdf files with more than 500 pages, but I need only a few pages in each file. It is necessary to preserve document`s title pages. I know exactly the numbers of the pages that program should remove. How I can do it using Python 2.7 Environment, which is installed upon MS Visual Studio?
2 Answers
Try using PyPDF2.
Instead of deleting pages, create a new document and add all pages which you don't want to delete.
Some sample code (originally adapted from BinPress which is dead, archived here).
from PyPDF2 import PdfWriter, PdfReader
pages_to_keep = [1, 2, 10] # page numbering starts from 0
infile = PdfReader('source.pdf', 'rb')
output = PdfWriter()
for i in pages_to_keep:
p = infile.pages[i]
output.add_page(p)
with open('newfile.pdf', 'wb') as f:
output.write(f)
or
from PyPDF2 import PdfWriter, PdfReader
pages_to_delete = [3, 4, 5] # page numbering starts from 0
infile = PdfReader('source.pdf', 'rb')
output = PdfWriter()
for i in range(len(infile.pages)):
if i not in pages_to_delete:
p = infile.pages[i]
output.add_page(p)
with open('newfile.pdf', 'wb') as f:
output.write(f)
4 Comments
djvg
or just
for i in pages_to_keep: ...?Maximilian Peters
@Dennis: Thanks for noticing! Updated the answer, links and added another sentence of explanation.
theredcap
@MaximilianPeters i tried this, and output pdf file size was slightly greater than the input file size by about 0.3MB for 500pages book. What might have been the reason for the increase in file size, I am just curious.
Jack Bosco
This worked, now I don't have to pay Adobe to remove a trailing blank page from Resume.pdf . Thank you!
Today in 2023, the other method to accomplish this is to use the PyMuPDF library. On Windows 11, you can install this from command prompt like so
pip install PyMuPDF
Once installed, you can use it as such:
# Import library
import fitz
# Open the PDF file
doc=fitz.open("in_file.pdf")
# Say, you like to save the first 6 pages, first page is 0
doc.select([0,1,2,3,4,5])
# Save the selected pages to a new PDF
doc.save("out_file_name.pdf")
1 Comment
user3892260
works great, but how come the resultant .pdf file is same size as original even though i'm only keeping 10% of the pages? is there some way to reduce the total final pdf size to a smaller value?