101

Currently I have this dictionary, printed using pprint:

{'AlarmExTempHum': '\x00\x00\x00\x00\x00\x00\x00\x00',  
'AlarmIn': 0,  
'AlarmOut': '\x00\x00',  
'AlarmRain': 0,  
'AlarmSoilLeaf': '\x00\x00\x00\x00',  
'BarTrend': 60,  
'BatteryStatus': 0,  
'BatteryVolts': 4.751953125,  
'CRC': 55003,
'EOL': '\n\r',
'ETDay': 0,
'ETMonth': 0,
'ETYear': 0,
'ExtraHum1': None,
'ExtraHum2': None,
'ExtraHum3': None,
'ExtraHum4': None,
'ExtraHum5': None,
'ExtraHum6': None,
'ExtraHum7': None,
'ExtraTemp1': None,
'ExtraTemp2': None,
'ExtraTemp3': None,
'ExtraTemp4': None,
'ExtraTemp5': None,
'ExtraTemp6': None,
'ExtraTemp7': None,
'ForecastIcon': 2,
'ForecastRuleNo': 122,
'HumIn': 31,
'HumOut': 94,
'LOO': 'LOO',
'LeafTemps': '\xff\xff\xff\xff',
'LeafWetness': '\xff\xff\xff\x00',
'NextRec': 37,
'PacketType': 0,
'Pressure': 995.9363359295631,
'RainDay': 0.0,
'RainMonth': 0.0,
'RainRate': 0.0,
'RainStorm': 0.0,
'RainYear': 2.8,
'SoilMoist': '\xff\xff\xff\xff',
'SoilTemps': '\xff\xff\xff\xff',
'SolarRad': None,
'StormStartDate': '2127-15-31',
'SunRise': 849,
'SunSet': 1611,
'TempIn': 21.38888888888889,
'TempOut': 0.8888888888888897,
'UV': None,
'WindDir': 219,
'WindSpeed': 3.6,
'WindSpeed10Min': 3.6}

When I do this:

import json
d = (my dictionary above)
jsonarray = json.dumps(d)

I get this error: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

1
  • Your problem lies here : \xff Commented Feb 2, 2013 at 10:50

4 Answers 4

169

If you are fine with non-printable symbols in your json, then add ensure_ascii=False to dumps call.

>>> json.dumps(your_data, ensure_ascii=False)

If ensure_ascii is false, then the return value will be a unicode instance subject to normal Python str to unicode coercion rules instead of being escaped to an ASCII str.

Sign up to request clarification or add additional context in comments.

1 Comment

add indent=n to the options to pretty print, where n is the number of spaces to indent
18

ensure_ascii=False really only defers the issue to the decoding stage:

>>> dict2 = {'LeafTemps': '\xff\xff\xff\xff',}
>>> json1 = json.dumps(dict2, ensure_ascii=False)
>>> print(json1)
{"LeafTemps": "����"}
>>> json.loads(json1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

Ultimately you can't store raw bytes in a JSON document, so you'll want to use some means of unambiguously encoding a sequence of arbitrary bytes as an ASCII string - such as base64.

>>> import json
>>> from base64 import b64encode, b64decode
>>> my_dict = {'LeafTemps': '\xff\xff\xff\xff',} 
>>> my_dict['LeafTemps'] = b64encode(my_dict['LeafTemps'])
>>> json.dumps(my_dict)
'{"LeafTemps": "/////w=="}'
>>> json.loads(json.dumps(my_dict))
{u'LeafTemps': u'/////w=='}
>>> new_dict = json.loads(json.dumps(my_dict))
>>> new_dict['LeafTemps'] = b64decode(new_dict['LeafTemps'])
>>> print new_dict
{u'LeafTemps': '\xff\xff\xff\xff'}

3 Comments

You could pass arbitrary binary data (inefficiently) in json using 'latin1' encoding
You could, I suppose, but json is designed/intended to use utf-8.
@J.F.Sebastian: Indeed, very inefficient as compared to b64encode. For example, for the 256 character string s = ''.join(chr(i) for i in xrange(256)), len(json.dumps(b64encode(s))) == 346 vs len(json.dumps(s.decode('latin1'))) == 1045.
10

If you use Python 2, don't forget to add the UTF-8 file encoding comment on the first line of your script.

# -*- coding: UTF-8 -*-

This will fix some Unicode problems and make your life easier.

Comments

1

One possible solution that I use is to use python3. It seems to solve many utf issues.

Sorry for the late answer, but it may help people in the future.

For example,

#!/usr/bin/env python3
import json
# your code follows

1 Comment

Sure, you're damn right, Python 3 solved many encoding issues. But that's not the answer to that question. It is explicitly tagged with python-2.7. So what you are saying is something like this: There is no built-in vacuum cleaner in your old car. So please buy a new car instead of adding a vacuum cleaner in your old car.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.