0

I am getting an error and I don't know what exactly I should do?! The error message:
File "pandas_libs\writers.pyx", line 55, in pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 147: ordinal not in range(128)

import numpy as np
import pandas as pd
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
import matplotlib.pyplot as mlpt
import tweepy
import csv
import pandas as pd
import random
import numpy as np
import pandas as pd
import re

consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

fetch_tweets=tweepy.Cursor(api.search, q="#unitedAIRLINES",count=100, lang ="en",since="2018-9-13", tweet_mode="extended").items()
data=pd.DataFrame(data=[[tweet_info.created_at.date(),tweet_info.full_text]for tweet_info in fetch_tweets],columns=['Date','Tweets'])

data.to_csv("Tweets.csv")
cdata=pd.DataFrame(columns=['Date','Tweets'])
total=100
index=0
for index,row in data.iterrows():
    stre=row["Tweets"]
    my_new_string = re.sub('[^ a-zA-Z0-9]', '', stre)
    cdata.sort_index()
    cdata.set_value(index,'Date',row["Date"])
    cdata.set_value(index,'Tweets',my_new_string)
    index=index+1
#print(cdata.dtypes)
cdata

The error

6
  • Please share the entire error message, as text. Commented Mar 4, 2020 at 21:28
  • 1
    Which line in your code has the error? data.to_csv("Tweets.csv") should default to utf-8, not ascii. Commented Mar 4, 2020 at 21:38
  • There is no certain line it is in the files of the pandas library itself, and show this error in dealing with storing data in the CSV "Excel" file, this error of (encode and UTF-8) and staff like that appeared too much without knowing how to deal with it Commented Mar 4, 2020 at 21:48
  • The error message says: File "pandas_libs\writers.pyx", line 55, in pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 147: ordinal not in range(128) Commented Mar 4, 2020 at 21:49
  • Unicode character 2026 is a horizontal elipsis. I haven't dealt with Pandas a lot - I'm not sure it can handle the full Unicode character set in its CSV output. The message would imply not: there is no way to convert that Unicode character to ASCII. Perhaps you can find and remove the Unicode elipsis in your input data, or generally "clean" your input data to be ASCII only? Commented Mar 4, 2020 at 21:55

2 Answers 2

1

I found a solution that works also: adding (encoding='utf-8') to the line: data.to_csv("Tweets.csv", encoding='utf-8')

Sign up to request clarification or add additional context in comments.

Comments

0

PANDAS is tripping up on handling Unicode data, presumably in generating a CSV output file.

One approach, if you don't really need to process Unicode data, is to simply make conversions on your data to get everything ASCII.

Another approach is to make a pass on your data prior to generating the CSV output file to get the UTF-8 encoding of any non-ASCII characters. (You may need to do this at the cell level of your spreadsheet data.)

I'm assuming Python3 here...

>>> s = "one, two, three, \u2026"
>>> print(s)
one, two, three, …
>>> ascii = str(s.encode("utf-8"))[2:-1]
>>> ascii
'one, two, three, \\xe2\\x80\\xa6'
>>> print(ascii)
one, two, three, \xe2\x80\xa6

See also: help() on codecs module.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.