5

just trying to load this JSON file(with non-ascii characters) as a python dictionary with Unicode encoding but still getting this error:

return codecs.ascii_decode(input, self.errors)[0]

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 90: ordinal not in range(128)

JSON file content = "tooltip":{ "dxPivotGrid-sortRowBySummary": "Sort\"{0}\"byThisRow",}

import sys  
import json

data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
    for line in f:
        data.append(json.loads(line.encode('utf-8','replace')))
7
  • json.loads as an argument encoding. What is real content of the pt-PT.json file - are there lines of valid JSON data, or it is one long JSON file? In later case it would be better load directly as a file, not line by line. Commented Apr 8, 2016 at 17:19
  • The string you show as JSON file content is not valid JSON, it is only fragment of larger object. Commented Apr 8, 2016 at 17:21
  • Tried loading as a file also but same issue and error is shown Commented Apr 8, 2016 at 17:29
  • Try to validate the JSON file by some JSON validator first. There are online tools, and some command line ones. Commented Apr 8, 2016 at 17:32
  • Check modified question now, it's due to some line in the json file, not sure how to fix it Commented Apr 8, 2016 at 18:08

4 Answers 4

11

You have several problems as near as I can tell. First, is the file encoding. When you open a file without specifying an encoding, the file is opened with whatever sys.getfilesystemencoding() is. Since that may vary (especially on Windows machines) its a good idea to explicitly use encoding="utf-8" for most json files. Because of your error message, I suspect that the file was opened with an ascii encoding.

Next, the file is decoded from utf-8 into python strings as it is read by the file system object. The utf-8 line has already been decoded to a string and is already ready for json to read. When you do line.encode('utf-8','replace'), you encode the line back into a bytes object which the json loads (that is, "load string") can't handle.

Finally, "tooltip":{ "navbar":"Operações de grupo"} isn't valid json, but it does look like one line of a pretty-printed json file containing a single json object. My guess is that you should read the entire file as 1 json object.

Putting it all together you get:

import json

with open('/Users/myvb/Desktop/Automation/pt-PT.json', encoding="utf-8") as f:
    data = json.load(f)

From its name, its possible that this file is encoded as a Windows Portugese code page. If so, the "cp860" encoding may work better.

Sign up to request clarification or add additional context in comments.

3 Comments

it's not because of portugese content but due to JSON file content = "tooltip":{ "dxPivotGrid-sortRowBySummary": "Sort\"{0}\"byThisRow",}
I see you've changed the string causing problems in your question from one that has non-ascii characters. The new string doesn't contain an 0xc3 utf-8 encoding byte so I don't see how it can produce the "can't decode byte 0xc3" error. Regardless, that string isn't valid JSON but does look like a fragment of valid JSON. Are you saying that the entire file contains just that one line?
This was a life saver when having problems with a script on a Chinese colleague's Windows machine that was fine on Macs.
0

I had the same problem, what worked for me was creating a regular expression, and parsing every line from the json file:

REGEXP = '[^A-Za-z0-9\'\:\.\;\-\?\!]+'
new_file_line = re.sub(REGEXP, ' ', old_file_line).strip()

1 Comment

This strips all non-English characters which is likely not what OP wants.
0

Having a file with content similar to yours I can read the file in one simple shot:

>>> import json
>>> fname = "data.json"
>>> with open(fname) as f:
...     data = json.load(f)
...
>>> data
{'tooltip': {'navbar': 'Operações de grupo'}}

2 Comments

after lot of analysis i found, it's giving this error because of this data in the json file:
"dxPivotGrid-sortRowBySummary": "Sort\"{0}\"byThisRow",
0

You don't need to read each line. You have two options:

import sys  
import json

data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
    data.append(json.load(f))

Or, you can load all lines and pass them to the json module:

import sys  
import json

data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
    data.append(json.loads(''.join(f.readlines())))

Obviously, the first suggestion is the best.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.