how do i change hyperlinks inside pdf using python?

Question

How do I change the hyperlinks in pdf using python? I am currently using a pyPDF2 to open up and loop through the pages. How do I actually scan for hyperlinks and then proceed to change the hyperlinks?

Sadderdaze · Accepted Answer · 2020-09-25 04:27:22Z

8

So I couldn't get what you want using the pyPDF2 library.

I did however get something working with another library: pdfrw. This installed fine for me using pip in Python 3.6:

pip install pdfrw

Note: for the following I have been using this example pdf I found online which contains multiple links. Your mileage may vary with this.

import pdfrw

pdf = pdfrw.PdfReader("pdf.pdf")  # Load the pdf
new_pdf = pdfrw.PdfWriter()  # Create an empty pdf

for page in pdf.pages:  # Go through the pages

    # Links are in Annots, but some pages don't have links so Annots returns None
    for annot in page.Annots or []:

        old_url = annot.A.URI

        # >Here you put logic for replacing the URLs<
        
        # Use the PdfString object to do the encoding for us
        # Note the brackets around the URL here
        new_url = pdfrw.objects.pdfstring.PdfString("(http://www.google.com)")

        # Override the URL with ours
        annot.A.URI = new_url

    new_pdf.addpage(page)    

new_pdf.write("new.pdf")

edited Sep 25, 2020 at 4:27

Sadderdaze

3143 silver badges14 bronze badges

answered Jul 19, 2017 at 16:13

alxwrd

2,51019 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MURTUZA BORIWALA Over a year ago

For some reasons, I am able to detect the URLs but unable to override it. I use the exact same code but something is not right. Possible to help me out?

User9123 · Accepted Answer · 2021-10-19 01:58:15Z

2

I managed to get it working with PyPDF2.

If you just want to remove all annotations for a page, you just have to do:

if '/Annots' in page: del page['/Annots']

Else, here is how you change each link:

import PyPDF2

new_link = "https://www.youtube.com/watch?v=dQw4w9WgXcQ" # great video by the way

pdf_reader = PyPDF2.PdfFileReader("input.pdf")
pdf_writer = PyPDF2.PdfFileWriter()

for i in range(pdf_reader.getNumPages()):
    page = pdf_reader.getPage(i)
    
    if '/Annots' not in page: continue
    for annot in page['/Annots']:
        annot_obj = annot.getObject()
        if '/A' not in annot_obj: continue  # not a link
        # you have to wrap the key and value with a TextStringObject:
        key   = PyPDF2.generic.TextStringObject("/URI")
        value = PyPDF2.generic.TextStringObject(new_link)
        annot_obj['/A'][key] = value
    
    pdf_writer.addPage(page)

with open('output.pdf', 'wb') as f:
    pdf_writer.write(f)

An equivalent one-liner for a given page index i and annotation index j would be:

pdf_reader.getPage(i)['/Annots'][j].getObject()['/A'][PyPDF2.generic.TextStringObject("/URI")] = PyPDF2.generic.TextStringObject(new_link)

answered Oct 19, 2021 at 1:58

User9123

6266 silver badges9 bronze badges

1 Comment

Hassan Anwer Over a year ago

how to open that URL in a new tab, currently it is opening in the same window when opened in browser.

Collectives™ on Stack Overflow

how do i change hyperlinks inside pdf using python?

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related