JSON dumps UnicodeDecodeError

Question

I currently have a program which grabs football data from a website and ports it into a dict:

 dict5[name] = ['To: ' + toteam, 'From: ' + fromteam, 'Price: ' + price, 'Date: ' + newdate]

The website is in Portuguese and the native encoding is UTF-8. The toteam, fromteam, price, and dates are all pre-encoded as UTF-8, just concatenated with the strings in the dict. The program runs just fine and prints to stdout with no problem, when when I try and dump it to a json file like this...

with open('test.json', 'w') as f:
  f.write(json.dumps(dict5, indent=2))

...it comes up with the following error:

Traceback:....
C:\Python27\lib\json\__init__.py, line 238, in dumps
  **kw).encode(obj)
C:\Python27\lib\json\encoder.py, line 203, in encode
  chunks = list(chunks)
C:\Python27\lib\json\encoder.py, line 428, in _iterencode
  for chunk in _iterencode_dict(o, _current_indent_level):
C:\Python27\lib\json\encoder.py, line 381, in _iterencode_dict
  yield_encoder(key)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte

This is essentially a copy of another program grabbing from the same site, with the same encoding, however that one works fine.

I feel like there's some element of unicode-ness that I'm not grasping. Can anybody shed some light on this?

Check that if name is a bytestring then it is also encoded as utf-8. Btw, you could use json.dump(dict5, f) — jfs
– jfs, Commented Aug 1, 2012 at 20:08

Ignacio Vazquez-Abrams · Accepted Answer · 2012-08-01 19:36:21Z

3

The toteam, fromteam, price, and dates are all pre-encoded as utf-8

Well there's your problem. Use unicodes instead.

answered Aug 1, 2012 at 19:36

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user1549620 Over a year ago

Well they were decoded to unicode then reencoded to the native utf-8 for testing purposes. Is this wrong? I don't see how that affects dumping to json

Martijn Pieters Over a year ago

@user1549620: JSON works with unicode; thus, data you pass to the json module must be unicode too. Don't pass it utf-8 encoded bytestrings.

user1549620 Over a year ago

Okay I took out all the utf-8 encodings and it worked perfectly. So as a general rule, is it best to keep it to unicode for json purposes?

Ignacio Vazquez-Abrams Over a year ago

It's best to keep it unicode for all purposes, unless encoding is required.

Collectives™ on Stack Overflow

JSON dumps UnicodeDecodeError

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related