Python: JSON to CSV Unicode Encoding

Question

I have a JSON file with Tweets - most of them are in German. I'd like to convert this to a CSV file. I'm new to Python and have followed this solution. But an encoding problem hinders me from proceeding.

Here is my code to generate a CSV file from the JSON file:

import csv
import json

x="""[
    {
   "created_at": "Thu, 24 Jan 2013 23:59:58 +0000",
   "id": 294595428815622140,
   "source": "&lt;a href=&quot;http://www.tweetdeck.com&quot;&gt;TweetDeck&lt;/a&gt;",
   "text": "RT @marthadear: ich heule gerade, aber h\\u00f6rt blo\\u00df nicht auf! #aufschrei",
   "user": {
      "profile_image_url": "http://a0.twimg.com/profile_images/3187103131/33d7b666c757b7c50b01342f05345210_normal.jpeg",
      "screen_name": "KatrinaR47"
   }
    }
]"""


x = json.loads(x)

f = csv.writer(open("test2.csv", "wb+"))

for x in x:
    f.writerow([x["created_at"], 
                x["id"], 
                x["source"], 
                x["text"],
                x["user"]["profile_image_url"],
                x["user"]["screen_name"]])

The error message I'm getting is the following

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-32-0096601ea1bd> in <module>()
     26                 x["text"],
     27                 x["user"]["profile_image_url"],
---> 28                 x["user"]["screen_name"]])
     29 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 40: ordinal not in range(128)

So apparently the decoding doesn't work. I've tried to find a number of different solutions but so far I wasn't able to apply one successfully. Can you help me out?

Are you using Python2 or Python3?

Robᵩ
– Robᵩ

2015-08-19 00:34:11 +00:00
Commented Aug 19, 2015 at 0:34 — Robᵩ
– Robᵩ, Commented Aug 19, 2015 at 0:34

Adrian Liaw · Accepted Answer · 2015-08-18 09:29:31Z

1

Try encode x["text"] before writing it (in line 26)
Like this:

f.writerow([x["created_at"], 
            x["id"], 
            x["source"], 
            x["text"].encode("utf-8"),
            x["user"]["profile_image_url"],
            x["user"]["screen_name"]])

answered Aug 18, 2015 at 9:29

Adrian Liaw

812 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yodel Over a year ago

Thank you so much, that worked perfectly. Any idea how I can decode the HTML elements like "<" or """ in "source"?

Adrian Liaw Over a year ago

use "html" library! import html html.unescape(x["source"])

Collectives™ on Stack Overflow

Python: JSON to CSV Unicode Encoding

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related