UnicodeEncodeError: 'ascii' codec can't encode

Question

Ia have the following data container which is constantly being updated:

  data = []
        for val, track_id in zip(values,list(track_ids)):
            #below
            if val < threshold:
                #structure data as dictionary
                pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
                data.append(pre_data)
        #write to file
        with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
            json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)

but I am getting a lot of errors like this:

json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True) File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump fp.write(chunk) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

Is there a way I can get rid of this encoding problem once and for all?

I was told that this would do it:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

but many people do not recommend it.

I use python 2.7.10

any clues?

Show the full error trace so we can see where the error is coming from. And is this Python 2 or 3? — Mark Ransom
– Mark Ransom, Commented Nov 21, 2016 at 20:56
sys.setdefaultencoding maybe worked in Python2 but doesn't exists in Python3. And it could work with print() but not with other thing like write to file so you have to show full error message and line which makes problem. — furas
– furas, Commented Nov 21, 2016 at 21:21

Mark Ransom · Accepted Answer · 2016-11-21 22:08:11Z

2

When you write to a file that was opened in text mode, Python encodes the string for you. The default encoding is ascii, which generates the error you see; there are a lot of characters that can't be encoded to ASCII.

The solution is to open the file in a different encoding. In Python 2 you must use the codecs module, in Python 3 you can add the encoding= parameter directly to open. utf-8 is a popular choice since it can handle all of the Unicode characters, and for JSON specifically it's the standard; see https://en.wikipedia.org/wiki/JSON#Data_portability_issues.

import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:

edited Nov 21, 2016 at 22:08

answered Nov 21, 2016 at 21:51

Mark Ransom

310k44 gold badges423 silver badges660 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tdelaney Over a year ago

You beat me to it! The RFC only allows utf-8, utf-16 and utf-32 encodings but puts restrictions on the second two (no BOM for instance) and hints that utf-8 is the only interoperable way to do it. mbcs would violate the rfc. I thought JSON was utf-8 only and was surprised that the other encodings are even allowed.

Mark Ransom Over a year ago

@tdelaney I've never dealt with JSON directly so I was unaware of the character set restriction, thanks! I'll edit the answer.

tdelaney · Accepted Answer · 2016-11-21 22:03:37Z

Your object has unicode strings and python 2.x's support for unicode can be a bit spotty. First, lets make a short example that demonstrates the problem:

>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)

From the json.dump help text:

If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only.  If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.

Ah! There is the solution. Either use the default ensure_ascii=True and get ascii escaped unicode characters or use the codecs module to open the file with the encoding you want. This works:

>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
>>>

Alex · Accepted Answer · 2016-11-21 21:07:57Z

0

Why not encode the specific string instead? try, the .encode('utf-8') method on the string that is raising the exception.

answered Nov 21, 2016 at 21:07

Alex

1,5521 gold badge14 silver badges27 bronze badges

Collectives™ on Stack Overflow

UnicodeEncodeError: 'ascii' codec can't encode

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related