1
from tweetpy import *
import re
import json
from pprint import pprint
import csv

# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream

# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN =  ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''

oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)

# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)

# Get a sample of the public data following through Twitter
iterator = twitter_stream.statuses.filter(track="#kindle",language="en",replies="all")
 # Print each tweet in the stream to the screen

 # Here we set it to stop after getting 10000000 tweets.
 # You don't have to set it to stop, but can continue running
 # the Twitter API to collect data for days or even longer.

tweet_count = 10000000

file = "C:\\Users\\WELCOME\\Desktop\\twitterfeeds.csv"
with open(file,"w") as csvfile:
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tweet in iterator:
        #pprint(tweet)
        username = str(tweet['user']['screen_name'])
        tweet_text = str(tweet['text'])
        user_timezone = str(tweet['user']['time_zone'])
        tweet_timestamp=str(tweet['created_at'])
        user_location = str(tweet['user']['location'])
        print tweet
        tweet_count -= 1
        writer.writerow({'Username':username,'Tweet':tweet_text,'Timezone':user_timezone,'Location':user_location,'Timestamp':tweet_timestamp})

        if tweet_count <= 0:
            break

I am trying to write tweets to to csv file with columns 'username', 'Tweet', 'Timezone', 'Location', and 'Timestamp'.

But I am getting the following error:

tweet_text = str(tweet['text'])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 139: ordinal not in range(128).

I know it is encoding issue but I dont know the exact position of the variable to encode.

2
  • What do you want to do with the offending character(s)? Omit them? Convert them to the closest ASCII equivalent? Convert to a fixed character such as a question mark? Commented Jun 4, 2017 at 16:14
  • The answer may very well be different for Python 2 vs Python 3. Regardless, you're not opening the csv file correctly. Suggest you read the documentation (in both versions) where how to do so correctly is shown. Commented Jun 4, 2017 at 16:48

2 Answers 2

1
  1. Use Python 3, because the Python 2 csv module doesn't do encodings well.
  2. Use open with the encoding and newline options.
  3. Remove str conversion (In Python 3 str is Unicode strings already.

Result:

with open(file,"w",encoding='utf8',newline='') as csvfile:
    fieldnames=['Username','Tweet','Timezone','Timestamp','Location']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for tweet in iterator:
        username = tweet['user']['screen_name']
        tweet_text = tweet['text']
        user_timezone = tweet['user']['time_zone']
        tweet_timestamp = tweet['created_at']
        user_location = tweet['user']['location']
            .
            .
            .

If using Python 2, get the 3rd party unicodecsv module to overcome csv shortcomings.

Sign up to request clarification or add additional context in comments.

Comments

0

If you really want to transform all your unicode data

tweet['text'].encode("ascii", "replace")
or
tweet['text'].encode("ascii", "ignore") # if you want skip char

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.