0

I currently have a program which grabs football data from a website and ports it into a dict:

 dict5[name] = ['To: ' + toteam, 'From: ' + fromteam, 'Price: ' + price, 'Date: ' + newdate]

The website is in Portuguese and the native encoding is UTF-8. The toteam, fromteam, price, and dates are all pre-encoded as UTF-8, just concatenated with the strings in the dict. The program runs just fine and prints to stdout with no problem, when when I try and dump it to a json file like this...

with open('test.json', 'w') as f:
  f.write(json.dumps(dict5, indent=2))

...it comes up with the following error:

Traceback:....
C:\Python27\lib\json\__init__.py, line 238, in dumps
  **kw).encode(obj)
C:\Python27\lib\json\encoder.py, line 203, in encode
  chunks = list(chunks)
C:\Python27\lib\json\encoder.py, line 428, in _iterencode
  for chunk in _iterencode_dict(o, _current_indent_level):
C:\Python27\lib\json\encoder.py, line 381, in _iterencode_dict
  yield_encoder(key)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte

This is essentially a copy of another program grabbing from the same site, with the same encoding, however that one works fine.

I feel like there's some element of unicode-ness that I'm not grasping. Can anybody shed some light on this?

3
  • Always useful: bit.ly/unipain Commented Aug 1, 2012 at 19:36
  • And how to reproduce that with which data? Commented Aug 1, 2012 at 19:36
  • 1
    Check that if name is a bytestring then it is also encoded as utf-8. Btw, you could use json.dump(dict5, f) Commented Aug 1, 2012 at 20:08

1 Answer 1

3

The toteam, fromteam, price, and dates are all pre-encoded as utf-8

Well there's your problem. Use unicodes instead.

Sign up to request clarification or add additional context in comments.

4 Comments

Well they were decoded to unicode then reencoded to the native utf-8 for testing purposes. Is this wrong? I don't see how that affects dumping to json
@user1549620: JSON works with unicode; thus, data you pass to the json module must be unicode too. Don't pass it utf-8 encoded bytestrings.
Okay I took out all the utf-8 encodings and it worked perfectly. So as a general rule, is it best to keep it to unicode for json purposes?
It's best to keep it unicode for all purposes, unless encoding is required.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.