0

I have the following code built using the Requests module:

import json
import requests
import jsonobject
import simplejson

url = 'http://www.whoscored.com/StatisticsFeed/1/GetPlayerStatistics'
params = {
            'category': 'shots',
            'subcategory': 'zones',
            'statsAccumulationType': '0',
            'isCurrent': 'true',
            'playerId': '',
            'teamIds': '',
            'matchId': '',
            'stageId': '9155',
            'tournamentOptions': '2',
            'sortBy': 'Rating',
            'sortAscending': '',
            'age': '',
            'ageComparisonType': '',
            'appearances': '',
            'appearancesComparisonType': '0',
            'field': 'Overall',
            'nationality': '',
            'positionOptions': '%27FW%27,%27AML%27,%27AMC%27,%27AMR%27,%27ML%27,%27MC%27,%27MR%27,%27DMC%27,%27DL%27,%27DC%27,%27DR%27,%27GK%27,%27Sub%27',
            'timeOfTheGameEnd': '5',
            'timeOfTheGameStart': '0',
            'isMinApp': '',
            'page': '1',
            'includeZeroValues': '',
            'numberOfPlayersToPick': '10'
            }

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
           'X-Requested-With': 'XMLHttpRequest',
           'Host': 'www.whoscored.com',
           'Referer': 'http://www.whoscored.com/'}

responser = requests.get(url, params=params, headers=headers)
print responser.status_code
responser = json.loads(responser.text.replace("'", '"').decode('cp1252'))
print responser

This is causing the following error:

Traceback (most recent call last):
  File "C:\Python27\counter.py", line 41, in <module>
    responser = json.loads(responser.text.replace("'", '"').decode('cp1252'))
  File "C:\Python27\lib\encodings\cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

I can see from the status code 200 that the HTTP request was successful, but I am still getting the above error. I have replaced single quotes with double ones as this is an issue I have experienced with this site before. I have also used to the decoding method compatible with Windows Command Shell, but am still having trouble.

Can anyone see what the issue is?

Thanks

10
  • You don't have text that you can decode; you still have bytes. Commented Nov 25, 2014 at 19:00
  • Why are you doing a search and replace on your response? What sort of response are you expecting? Try doing json.loads(responser.text.decode(encoding='cp1252')) Commented Nov 25, 2014 at 19:02
  • @rpgillespie utf-8 encoding doesnt work very well with command shell for none english language characters, whereas the encoding i have used does. either way your suggestion did not work. the expected response will be a set of nested lists. json only allows double quotes, where as the data returned could contain single, hence the substitution. Commented Nov 25, 2014 at 19:05
  • Why would your server return non-valid JSON? Commented Nov 25, 2014 at 19:05
  • Also, why not just do responser.json()? Commented Nov 25, 2014 at 19:05

1 Answer 1

1

The problem is that you think the response is JSON, but it's actually HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> 
<meta http-equiv="content-language" content="en" />
<title>WhoScored.com</title>
<script type="text/javascript">var NREUMQ=NREUMQ||[];NREUMQ.push(["mark","firstbyte",new Date().getTime()]);</script></head>
<body style="padding: 20px; font-family:Arial,Helvetica,sans-serif; background-color:#222222;">
    <div style="margin:0 auto; padding: 40px 20px; width:560px; background-color:#fff;">
        The page you requested does not exist in <a href="http://www.whoscored.com">WhoScored.com</a>
    </div>
<script type="text/javascript"> if (!NREUMQ.f) {NREUMQ.f=function() {NREUMQ.push(["load",new Date().getTime()]);var e=document.createElement("script"); e.type="text/javascript"; e.src=(("http:"===document.location.protocol)?"http:":"https:") + "//" + "js-agent.newrelic.com/nr-100.js"; document.body.appendChild(e);if(NREUMQ.a)NREUMQ.a();};NREUMQ.a=window.onload;window.onload=NREUMQ.f;};NREUMQ.push(["nrfj","beacon-2.newrelic.com","47235c2cb5","2727698","MVBVZhMHDEcCV0BdCwgaeV0TCwNYCk5RUDEUXBgcSw4WWQ8=",0,0,new Date().getTime(),"E2B84976C1F7ADB9","","","",""]);</script></body>
</html>

This is not valid JSON, and hence you can not use json.loads on it.

Also, AJAX stands for Asynchronous Javascript And XML, and is completely unrelated to the type of response you'll get back.

Sign up to request clarification or add additional context in comments.

1 Comment

that response is a page not found one from the server. that is not a typical response i would expect, but it is still a server response. therefore i am getting a 200 code and not the data i was expecting. i need to have a look at the params being submitted as part of the URL again i think.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.