1

I'm really close having a script that fetches JSON from the New York Times API, then converts it to CSV. However, occasionally I get this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 21: ordinal not in range(128)

I think I could avoid this all together if I converted the output to UTF-8, but I am unsure how to do so. Here is my python script:

import urllib2
import json
import csv

outfile_path='/NYTComments.csv'

writer = csv.writer(open(outfile_path, 'w'))

url = urllib2.Request('http://api.nytimes.com/svc/community/v2/comments/recent?api-key=ea7aac6c5d0723d7f1e06c8035d27305:5:66594855')

parsed_json = json.load(urllib2.urlopen(url))

print parsed_json

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(str(comment['commentBody']))
    row.append(str(comment['commentTitle']))
    row.append(str(comment['approveDate']))
    writer.writerow(row)
2
  • What does the full error trace look like? Where's the error originating? Commented Oct 5, 2012 at 4:01
  • line 21, in <module> writer.writerow(row) Commented Oct 5, 2012 at 12:42

2 Answers 2

1

A few things...

  • I don't know anything about the NewYork Times API, but I would guess you probably shouldn't publish a code snippet with your "api-key". Just a guess on this point (I've never used this API before)

  • If you look, the API is tells you the encoding. You are getting the following back in the header:

    Content-Type=application/json; charset=UTF-8 
    
  • Googling "python and UnicodeEncodeError" will give you a lot of help. But here, it seems your problem is probably calling the "str" on the comments. In which case, it will use the 'ascii' codec. And if there is a char above 128, then boom. You get the error you are seeing. Here is a pretty good blog post on the topic. It might help you to read over it.

Edit: This solution works for me:

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(comment['commentBody'].encode('UTF-8', 'replace'))
    row.append(comment['commentTitle'].encode('UTF-8', 'replace'))
    row.append(str(comment['approveDate']))
    writer.writerow(row)
Sign up to request clarification or add additional context in comments.

4 Comments

I actually think it's being parsed correctly to Unicode and the error is in one of the str calls, which is why I asked for some clarification.
@MarkRansom - yeah. I bet you are probably right. But, I want to point out the fact that he had published his key (which I'm guessing should be private) and that he can find out the encoding by looking at the response headers. He seems to be sort of guessing about what encoding he is dealing with here. I will change my answer.
You're right, I should have hid the key, but they really aren't valuable. Anyone with an email can get one. I'm still getting the same error when I change the encoding to unicode.
@ChrisJ.Vargo I honestly don't know the value of the key. Just looked like a possible copy/paste "oopsy". comment['commentBody'].encode('UTF-8', 'ignore') doesn't not work?
0

Replace the second and third call to str() with unicode().

for comment in parsed_json['results']['comments']:
    row = []
    row.append(str(comment['commentSequence']))
    row.append(unicode(comment['commentBody']))
    row.append(unicode(comment['commentTitle']))
    row.append(str(comment['approveDate']))
    writer.writerow(row)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.