1

I'm trying to get the html of this page:

 url = 'http://www.metacritic.com/movie/oslo-august-31st/critic-reviews'

and I'm trying to get it using requests:

 oslo = requests.get(url)

but they seem to know that I'm accessing it this way and when I open up the file I get:

\n\n\n403 Forbidden\n\n\n

Error 403 Forbidden

\n

Forbidden

\n

Guru Meditation:

\n

XID: 961167012

\n
\n

Varnish cache server

\n\n\n

Is there any other way to access the html's other than manually copying and pasting every html from every page?

1
  • 1
    Some websites look at the "User Agent" header, or other headers, to tell if the request is coming from a web scraper. Some websites will deny your request if they think you're a scraper. What headers are you sending? Commented Jul 13, 2016 at 3:18

1 Answer 1

1

You need to specify a User-Agent header to get 200 response:

import requests

url = 'http://www.metacritic.com/movie/oslo-august-31st/critic-reviews'

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
print(response.status_code)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.