Python - Getting A Page's Complete HTML Via Url / Request ERROR

Question

I'm trying to get the html of this page:

 url = 'http://www.metacritic.com/movie/oslo-august-31st/critic-reviews'

and I'm trying to get it using requests:

 oslo = requests.get(url)

but they seem to know that I'm accessing it this way and when I open up the file I get:

\n\n\n403 Forbidden\n\n\n

Error 403 Forbidden

\n

Forbidden

\n

Guru Meditation:

\n

XID: 961167012

\n

Varnish cache server

\n\n\n

Is there any other way to access the html's other than manually copying and pasting every html from every page?

Some websites look at the "User Agent" header, or other headers, to tell if the request is coming from a web scraper. Some websites will deny your request if they think you're a scraper. What headers are you sending? — John Gordon
– John Gordon, Commented Jul 13, 2016 at 3:18

alecxe · Accepted Answer · 2016-07-13 03:23:41Z

1

You need to specify a User-Agent header to get 200 response:

import requests

url = 'http://www.metacritic.com/movie/oslo-august-31st/critic-reviews'

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
print(response.status_code)

answered Jul 13, 2016 at 3:23

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python - Getting A Page's Complete HTML Via Url / Request ERROR

Error 403 Forbidden

Guru Meditation:

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Error 403 Forbidden

Guru Meditation:

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related