0

I have a JSON file with Tweets - most of them are in German. I'd like to convert this to a CSV file. I'm new to Python and have followed this solution. But an encoding problem hinders me from proceeding.

Here is my code to generate a CSV file from the JSON file:

import csv
import json

x="""[
    {
   "created_at": "Thu, 24 Jan 2013 23:59:58 +0000",
   "id": 294595428815622140,
   "source": "<a href="http://www.tweetdeck.com">TweetDeck</a>",
   "text": "RT @marthadear: ich heule gerade, aber h\\u00f6rt blo\\u00df nicht auf! #aufschrei",
   "user": {
      "profile_image_url": "http://a0.twimg.com/profile_images/3187103131/33d7b666c757b7c50b01342f05345210_normal.jpeg",
      "screen_name": "KatrinaR47"
   }
    }
]"""


x = json.loads(x)

f = csv.writer(open("test2.csv", "wb+"))

for x in x:
    f.writerow([x["created_at"], 
                x["id"], 
                x["source"], 
                x["text"],
                x["user"]["profile_image_url"],
                x["user"]["screen_name"]])

The error message I'm getting is the following

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-32-0096601ea1bd> in <module>()
     26                 x["text"],
     27                 x["user"]["profile_image_url"],
---> 28                 x["user"]["screen_name"]])
     29 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 40: ordinal not in range(128)

So apparently the decoding doesn't work. I've tried to find a number of different solutions but so far I wasn't able to apply one successfully. Can you help me out?

1
  • Are you using Python2 or Python3? Commented Aug 19, 2015 at 0:34

1 Answer 1

1

Try encode x["text"] before writing it (in line 26)
Like this:

f.writerow([x["created_at"], 
            x["id"], 
            x["source"], 
            x["text"].encode("utf-8"),
            x["user"]["profile_image_url"],
            x["user"]["screen_name"]])
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much, that worked perfectly. Any idea how I can decode the HTML elements like "&lt;" or "&quot;" in "source"?
use "html" library! import html html.unescape(x["source"])

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.