0

Below is part of my code that is giving me error..

# get all browser products
raw_json_file = open( script_dir + "raw_json.js", 'r' )
raw_json = raw_json_file.read()
all_str = raw_json[ raw_json.find("{"): ]
all_obj = json.loads(all_str)
browser_products = all_obj["categories"]["6"]["products"]

the error I am getting here is as below:

C:\Python34>python parse.py 8.3.4
argument is 8.3.4
Traceback (most recent call last):
  File "parse.py", line 42, in <module>
    raw_json = raw_json_file.read()
  File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 563011: character maps to <undefined>

Please let me know how to solve this error.

2
  • have you created your JSON using the correct mapping. JSON <==> Python, like this. object == dict, array == list, string == unicode, number (int) == int long, number (real) == float, true == True, false == False, null == None. Commented Aug 8, 2015 at 15:05
  • js files are javascript. json is a json file. Commented Aug 8, 2015 at 15:57

2 Answers 2

3

When you use open() in Python 3, by default it will assume the file is encoded in some system default which, it appears in your case, is the Windows cp1252 encoding. Perhaps the file is in fact encoded in some other way, for example the very common UTF-8.
You can try

raw_json_file = open( script_dir + "raw_json.js", 'r', encoding='utf8' )

to see if that is so, but really you need to ask whoever provided the file what encoding they used.

Sign up to request clarification or add additional context in comments.

Comments

0

The best way to do this is try encoding and decoding to utf-8.

# get all browser products
raw_json_file = open( script_dir + "raw_json.js", 'r' )
raw_json = bytes(raw_json_file.read()).decode()
all_str = raw_json[ raw_json.find("{"): ]
all_obj = json.loads(all_str)
browser_products = all_obj["categories"]["6"]["products"]
#Do this to the strings you get from the JSON file directly. Then, use them as needed. 

Computers have different codes and values they can give to certain characters, one of these codesets is unicode. However, these aren't normal strings so you have to convert them in Python before using them. Try seeing This Article for more information.

Also, your file is not a JSON file but a JavaScript programming language file used for developing websites. Please confirm the validity of your file if you didn't make it. If you made the file yourself, then you don't need to worry. Changing the extension is just a way to tell your computer what program to use to open the file.

KEEP IN MIND

When I did decode(), it automatically decodes from utf-8 characterization. If you do not know what the encoding of the file is, try installing chardet module with pip and see the chardet documentation for usage tutorials. Once you find out (sometimes, if your file uses different character sets, chardet will not work and you will have to change all the encodings to one uniform encoding) the encoding, add it as a string argument to decode().

Sorry if this all seems overwhelming, but there are a lot of people who have had this error. Just google it and find the solution :).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.