0

Ia have the following data container which is constantly being updated:

  data = []
        for val, track_id in zip(values,list(track_ids)):
            #below
            if val < threshold:
                #structure data as dictionary
                pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
                data.append(pre_data)
        #write to file
        with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
            json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)

but I am getting a lot of errors like this:

json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True) File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump fp.write(chunk) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

Is there a way I can get rid of this encoding problem once and for all?

I was told that this would do it:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

but many people do not recommend it.

I use python 2.7.10

any clues?

4
  • Show the full error trace so we can see where the error is coming from. And is this Python 2 or 3? Commented Nov 21, 2016 at 20:56
  • sys.setdefaultencoding maybe worked in Python2 but doesn't exists in Python3. And it could work with print() but not with other thing like write to file so you have to show full error message and line which makes problem. Commented Nov 21, 2016 at 21:21
  • @MarkRansom updated, thanks Commented Nov 21, 2016 at 21:41
  • @furas full error above Commented Nov 21, 2016 at 21:41

3 Answers 3

2

When you write to a file that was opened in text mode, Python encodes the string for you. The default encoding is ascii, which generates the error you see; there are a lot of characters that can't be encoded to ASCII.

The solution is to open the file in a different encoding. In Python 2 you must use the codecs module, in Python 3 you can add the encoding= parameter directly to open. utf-8 is a popular choice since it can handle all of the Unicode characters, and for JSON specifically it's the standard; see https://en.wikipedia.org/wiki/JSON#Data_portability_issues.

import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:
Sign up to request clarification or add additional context in comments.

2 Comments

You beat me to it! The RFC only allows utf-8, utf-16 and utf-32 encodings but puts restrictions on the second two (no BOM for instance) and hints that utf-8 is the only interoperable way to do it. mbcs would violate the rfc. I thought JSON was utf-8 only and was surprised that the other encodings are even allowed.
@tdelaney I've never dealt with JSON directly so I was unaware of the character set restriction, thanks! I'll edit the answer.
1

Your object has unicode strings and python 2.x's support for unicode can be a bit spotty. First, lets make a short example that demonstrates the problem:

>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
    fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)

From the json.dump help text:

If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only.  If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.

Ah! There is the solution. Either use the default ensure_ascii=True and get ascii escaped unicode characters or use the codecs module to open the file with the encoding you want. This works:

>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
...     json.dump(obj, f, ensure_ascii=False)
... 
>>> 

Comments

0

Why not encode the specific string instead? try, the .encode('utf-8') method on the string that is raising the exception.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.