Converting ASCII output to UTF-8

Question

I'm really close having a script that fetches JSON from the New York Times API, then converts it to CSV. However, occasionally I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 21: ordinal not in range(128)

I think I could avoid this all together if I converted the output to UTF-8, but I am unsure how to do so. Here is my python script:

import urllib2
import json
import csv

outfile_path='/NYTComments.csv'

writer = csv.writer(open(outfile_path, 'w'))

url = urllib2.Request('http://api.nytimes.com/svc/community/v2/comments/recent?api-key=ea7aac6c5d0723d7f1e06c8035d27305:5:66594855')

parsed_json = json.load(urllib2.urlopen(url))

print parsed_json

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(str(comment['commentBody']))
    row.append(str(comment['commentTitle']))
    row.append(str(comment['approveDate']))
    writer.writerow(row)

What does the full error trace look like? Where's the error originating? — Mark Ransom
– Mark Ransom, Commented Oct 5, 2012 at 4:01

David S · Accepted Answer · 2012-10-05 14:03:30Z

1

A few things...

I don't know anything about the NewYork Times API, but I would guess you probably shouldn't publish a code snippet with your "api-key". Just a guess on this point (I've never used this API before)
If you look, the API is tells you the encoding. You are getting the following back in the header:
```
Content-Type=application/json; charset=UTF-8 
```
Googling "python and UnicodeEncodeError" will give you a lot of help. But here, it seems your problem is probably calling the "str" on the comments. In which case, it will use the 'ascii' codec. And if there is a char above 128, then boom. You get the error you are seeing. Here is a pretty good blog post on the topic. It might help you to read over it.

Edit: This solution works for me:

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(comment['commentBody'].encode('UTF-8', 'replace'))
    row.append(comment['commentTitle'].encode('UTF-8', 'replace'))
    row.append(str(comment['approveDate']))
    writer.writerow(row)

edited Oct 5, 2012 at 14:03

answered Oct 5, 2012 at 4:06

David S

14.1k13 gold badges60 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mark Ransom Over a year ago

I actually think it's being parsed correctly to Unicode and the error is in one of the str calls, which is why I asked for some clarification.

David S Over a year ago

@MarkRansom - yeah. I bet you are probably right. But, I want to point out the fact that he had published his key (which I'm guessing should be private) and that he can find out the encoding by looking at the response headers. He seems to be sort of guessing about what encoding he is dealing with here. I will change my answer.

Chris J. Vargo Over a year ago

You're right, I should have hid the key, but they really aren't valuable. Anyone with an email can get one. I'm still getting the same error when I change the encoding to unicode.

David S Over a year ago

@ChrisJ.Vargo I honestly don't know the value of the key. Just looked like a possible copy/paste "oopsy". comment['commentBody'].encode('UTF-8', 'ignore') doesn't not work?

Nathan Villaescusa · Accepted Answer · 2012-10-05 04:41:55Z

0

Replace the second and third call to str() with unicode().

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(unicode(comment['commentBody']))
    row.append(unicode(comment['commentTitle']))
    row.append(str(comment['approveDate']))
    writer.writerow(row)

answered Oct 5, 2012 at 4:41

Nathan Villaescusa

17.7k4 gold badges55 silver badges58 bronze badges

Collectives™ on Stack Overflow

Converting ASCII output to UTF-8

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related