I want to scrape a website (for player statistics from a football match) but I get a 403 error. This is my first attempt at scraping.
headers = {'Sec-Fetch-Mode': 'no-cors',
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
result = requests.get(url, headers=headers)
print(result.status_code)
Edit: I can open the webpage using my browser (chrome).
Edit2: If I run
print(result.status_code)
print(result.headers)
print(result.content)
then I get the following
403
{'Content-Type': 'text/html', 'Cache-Control': 'no-cache', 'Connection': 'close', 'Content-Length': '736', 'X-Iinfo': '9-168604272-0 0NNN RT(1566297863307 56) q(0 -1 -1 -1) r(0 -1) B15(4,200,0) U18', 'X-Iejgwucgyu': '1', 'Set-Cookie': 'visid_incap_774904=wSb3+5UxQeC+slK3rAhjswfPW10AAAAAQUIPAAAAAADmqJS6Gs0uzOV2Z5XomjoU; expires=Wed, 19 Aug 2020 06:56:00 GMT; path=/; Domain=.whoscored.com, incap_ses_198_774904=2GHrGcAd9C8niMLwwnK/AgfPW10AAAAAttp7+XadyowHY5iqiWs/Yg==; path=/; Domain=.whoscored.com'}
b'<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?CWUDNSAI=21&xinfo=9-168604272-0%200NNN%20RT%281566297863307%2056%29%20q%280%20-1%20-1%20-1%29%20r%280%20-1%29%20B15%284%2c200%2c0%29%20U18&incident_id=198003090216026722-548063901729035097&edet=15&cinfo=04000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 198003090216026722-548063901729035097</iframe></body></html>'
User-Agent. Maybe server had problem one day. Can you open it in web browser? Or maybe you made so many requests so server blocks you.result.contentcontains the word ROBOTS which makes me think my request has been handled as a bot.requests.get) you can mimic a web browser, but not usingrequests. You can see this question for details: stackoverflow.com/q/22966787/9321755