21

What is the best way to load JSON Strings in Python?

I want to use json.loads to process unicode like this:

import json
json.loads(unicode_string_to_load)

I also tried supplying 'encoding' parameter with value 'utf-16', but the error did not go away.

Full SSCCE with error:

# -*- coding: utf-8 -*-
import json
value = '{"foo" : "bar"}'
print(json.loads(value)['foo'])     #This is correct, prints 'bar'

some_unicode = unicode("degradé")  
#last character is latin e with acute "\xe3\xa9"
value = '{"foo" : "' + some_unicode + '"}'
print(json.loads(value)['foo'])            #incorrect, throws error

Error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 
6: ordinal not in range(128)
8
  • Do you have any source data that shows the problem? Commented Feb 10, 2010 at 3:33
  • 1
    I get this error. (UnicodeDecodeError: 'utf16' codec can't decode byte 0x38 in position 6: truncated data) I use this command: json.loads(response, encoding='utf-16'). Sequencly this error comes for many unicode characters... Commented Feb 10, 2010 at 3:34
  • 1
    ..... I think we need to have a little discussion as to what "source data" means... Commented Feb 10, 2010 at 3:37
  • Source data is huge unicode encoded string which i can't past it here..., Commented Feb 10, 2010 at 3:39
  • Then paste some of it. Something for us to go on... Commented Feb 10, 2010 at 3:40

4 Answers 4

12

I typecasting the string into unicode string using 'latin-1' fixed the error:

UnicodeDecodeError: 'utf16' codec can't decode byte 0x38 in 
position 6: truncated data

Fixed code:

import json

ustr_to_load = unicode(str_to_load, 'latin-1')

json.loads(ustr_to_load)

And then the error is not thrown.

Sign up to request clarification or add additional context in comments.

1 Comment

BTW, latin-1 is the old name for iso-8859-1 and these days you're much more likely to see iso-8859-15 -- the only difference is that the latter includes the Euro sign. If you decode with -1 and the string was encoded with -15 it will mostly be OK but Euro signs will look very peculiar when you print or show them.
6

The OP clarifies (in a comment!)...:

Source data is huge unicode encoded string

Then you have to know which of the many unicode encodings it uses -- clearly not 'utf-16', since that failed, but there are so many others -- 'utf-8', 'iso-8859-15', and so forth. You either try them all until one works, or print repr(str_to_load[:80]) and paste what it shows as an edit of your question, so we can guess on your behalf!-).

7 Comments

It is difficult to identify particular encoding during load because source data may contain characters from various languages of the world. Is there any way to detect encoding type?
str_to_load keeps on changing, utf-8 worked for some, utf-32 worked for some... but how do I auto detect it?
That string is '{"successful":true, "data":[76,{"posting_id":"1753178","site_tender_id":"3188446'
To try and guess the encoding of a byte string -- try chardet.feedparser.org . The string you show is ASCII (which is also valid utf-8 by definition, and also valid iso-8859-1, etc: ASCII is the common subset of most encodings!) so it's impossible to guess what potential non-ASCII encoding it might be in. UnicodeDecodeError messages carry the exact index of the first problematic byte, so show the repr of the 80-long byte string centered on that index when you do get an error.
When I read the entire string, I found unicode characters, have a look at it in the next string... "Lucaya, Grand Bahama; 4 Bedroom, 3 \xbd Bathroom"
|
6

The simplest way I have found is

import simplejson as json

that way your code remains the same

json.loads(str_to_load)

reference: https://simplejson.readthedocs.org/en/latest/

Comments

1

With django you can use SimpleJSON and use loads instead of just load.

from django.utils import simplejson

simplejson.loads(str_to_load, "utf-8")

1 Comment

this no longer works in django as it uses the default that comes with python

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.