How do I change the hyperlinks in pdf using python? I am currently using a pyPDF2 to open up and loop through the pages. How do I actually scan for hyperlinks and then proceed to change the hyperlinks?
2 Answers
So I couldn't get what you want using the pyPDF2 library.
I did however get something working with another library: pdfrw. This installed fine for me using pip in Python 3.6:
pip install pdfrw
Note: for the following I have been using this example pdf I found online which contains multiple links. Your mileage may vary with this.
import pdfrw
pdf = pdfrw.PdfReader("pdf.pdf") # Load the pdf
new_pdf = pdfrw.PdfWriter() # Create an empty pdf
for page in pdf.pages: # Go through the pages
# Links are in Annots, but some pages don't have links so Annots returns None
for annot in page.Annots or []:
old_url = annot.A.URI
# >Here you put logic for replacing the URLs<
# Use the PdfString object to do the encoding for us
# Note the brackets around the URL here
new_url = pdfrw.objects.pdfstring.PdfString("(http://www.google.com)")
# Override the URL with ours
annot.A.URI = new_url
new_pdf.addpage(page)
new_pdf.write("new.pdf")
1 Comment
MURTUZA BORIWALA
For some reasons, I am able to detect the URLs but unable to override it. I use the exact same code but something is not right. Possible to help me out?
I managed to get it working with PyPDF2.
If you just want to remove all annotations for a page, you just have to do:
if '/Annots' in page: del page['/Annots']
Else, here is how you change each link:
import PyPDF2
new_link = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # great video by the way
pdf_reader = PyPDF2.PdfFileReader("input.pdf")
pdf_writer = PyPDF2.PdfFileWriter()
for i in range(pdf_reader.getNumPages()):
page = pdf_reader.getPage(i)
if '/Annots' not in page: continue
for annot in page['/Annots']:
annot_obj = annot.getObject()
if '/A' not in annot_obj: continue # not a link
# you have to wrap the key and value with a TextStringObject:
key = PyPDF2.generic.TextStringObject("/URI")
value = PyPDF2.generic.TextStringObject(new_link)
annot_obj['/A'][key] = value
pdf_writer.addPage(page)
with open('output.pdf', 'wb') as f:
pdf_writer.write(f)
An equivalent one-liner for a given page index i and annotation index j would be:
pdf_reader.getPage(i)['/Annots'][j].getObject()['/A'][PyPDF2.generic.TextStringObject("/URI")] = PyPDF2.generic.TextStringObject(new_link)
1 Comment
Hassan Anwer
how to open that URL in a new tab, currently it is opening in the same window when opened in browser.