I currently have a program which grabs football data from a website and ports it into a dict:
dict5[name] = ['To: ' + toteam, 'From: ' + fromteam, 'Price: ' + price, 'Date: ' + newdate]
The website is in Portuguese and the native encoding is UTF-8. The toteam, fromteam, price, and dates are all pre-encoded as UTF-8, just concatenated with the strings in the dict. The program runs just fine and prints to stdout with no problem, when when I try and dump it to a json file like this...
with open('test.json', 'w') as f:
f.write(json.dumps(dict5, indent=2))
...it comes up with the following error:
Traceback:....
C:\Python27\lib\json\__init__.py, line 238, in dumps
**kw).encode(obj)
C:\Python27\lib\json\encoder.py, line 203, in encode
chunks = list(chunks)
C:\Python27\lib\json\encoder.py, line 428, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
C:\Python27\lib\json\encoder.py, line 381, in _iterencode_dict
yield_encoder(key)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 0: invalid continuation byte
This is essentially a copy of another program grabbing from the same site, with the same encoding, however that one works fine.
I feel like there's some element of unicode-ness that I'm not grasping. Can anybody shed some light on this?
nameis a bytestring then it is also encoded as utf-8. Btw, you could usejson.dump(dict5, f)