Access denied while downloading PDF using Python Requests

Question

I am looking for downloading the PDFs with python and using requests library for the same. Following code works for some of the PDF documents but It throws an error for few documents.

from pathlib import Path
import requests

filename = Path('c:/temp.pdf')
url = 'https://www.rolls-royce.com/~/media/Files/R/Rolls-Royce/documents/investors/annual-reports/rr-full%20annual%20report--tcm92-55530.pdf'
response = requests.get(url,verify=False)
filename.write_bytes(response.content)

Following is the exact response (response.content), however, I can download the same document using a chrome browser without any error

b'<HTML><HEAD>\n<TITLE>Access Denied</TITLE>\n</HEAD><BODY>\n<H1>Access Denied</H1>\n \nYou don\'t have permission to access "http&#58;&#47;&#47;www&#46;rolls&#45;royce&#46;com&#47;&#37;7e&#47;media&#47;Files&#47;R&#47;Rolls&#45;Royce&#47;documents&#47;investors&#47;annual&#45;reports&#47;rr&#45;full&#37;20annual&#37;20report&#45;&#45;tcm92&#45;55530&#46;pdf" on this server.<P>\nReference&#32;&#35;18&#46;36ad4d68&#46;1562842755&#46;6294c42\n</BODY>\n</HTML>\n'

Is there any way to get rid out of this?

No I did not try, can you help me out exactly what argument should be passed? — Ravi Shah
– Ravi Shah, Commented Jul 12, 2019 at 8:48

Ivan Vinogradov · Accepted Answer · 2019-07-12 10:44:04Z

1

You get 403 Forbidden because requests by default sends User-Agent: python-requests/2.19.1 header and server denies your request.

You can get the correct value for this header from your browser and everything will be fine.

For example:

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36'}
url = 'https://www.rolls-royce.com/~/media/Files/R/Rolls-Royce/documents/investors/annual-reports/rr-full%20annual%20report--tcm92-55530.pdf'

r = requests.get(url, headers=headers)
print(r.status_code)  # 200

answered Jul 12, 2019 at 10:44

Ivan Vinogradov

4,5236 gold badges34 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ravi Shah Over a year ago

status code received 200 but It does not download the PDF successfully. but when i am using headers={'User-Agent': 'My Browser'} then It downloads the PDF successfully.

Ravi Shah Over a year ago

However, I got another used case. I am not able to download the PDF for the link (eisai.com/ir/library/annual/pdf/epdf2017ir.pdf) even using the headers as mentioned earlier.

Collectives™ on Stack Overflow

Access denied while downloading PDF using Python Requests

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related