UnicodeEncodeError in Python

Question

I am getting an error and I don't know what exactly I should do?! The error message:
File "pandas_libs\writers.pyx", line 55, in pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 147: ordinal not in range(128)

import numpy as np
import pandas as pd
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
import matplotlib.pyplot as mlpt
import tweepy
import csv
import pandas as pd
import random
import numpy as np
import pandas as pd
import re

consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)

fetch_tweets=tweepy.Cursor(api.search, q="#unitedAIRLINES",count=100, lang ="en",since="2018-9-13", tweet_mode="extended").items()
data=pd.DataFrame(data=[[tweet_info.created_at.date(),tweet_info.full_text]for tweet_info in fetch_tweets],columns=['Date','Tweets'])

data.to_csv("Tweets.csv")
cdata=pd.DataFrame(columns=['Date','Tweets'])
total=100
index=0
for index,row in data.iterrows():
    stre=row["Tweets"]
    my_new_string = re.sub('[^ a-zA-Z0-9]', '', stre)
    cdata.sort_index()
    cdata.set_value(index,'Date',row["Date"])
    cdata.set_value(index,'Tweets',my_new_string)
    index=index+1
#print(cdata.dtypes)
cdata

Which line in your code has the error? data.to_csv("Tweets.csv") should default to utf-8, not ascii. — tdelaney
– tdelaney, Commented Mar 4, 2020 at 21:38
There is no certain line it is in the files of the pandas library itself, and show this error in dealing with storing data in the CSV "Excel" file, this error of (encode and UTF-8) and staff like that appeared too much without knowing how to deal with it — Abdel Rahman Ayman Hindi
– Abdel Rahman Ayman Hindi, Commented Mar 4, 2020 at 21:48
The error message says: File "pandas_libs\writers.pyx", line 55, in pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 147: ordinal not in range(128) — Abdel Rahman Ayman Hindi
– Abdel Rahman Ayman Hindi, Commented Mar 4, 2020 at 21:49
Unicode character 2026 is a horizontal elipsis. I haven't dealt with Pandas a lot - I'm not sure it can handle the full Unicode character set in its CSV output. The message would imply not: there is no way to convert that Unicode character to ASCII. Perhaps you can find and remove the Unicode elipsis in your input data, or generally "clean" your input data to be ASCII only? — user1441004
– user1441004, Commented Mar 4, 2020 at 21:55

Abdel Rahman Ayman Hindi · Accepted Answer · 2020-03-05 11:43:12Z

1

I found a solution that works also: adding (encoding='utf-8') to the line: data.to_csv("Tweets.csv", encoding='utf-8')

answered Mar 5, 2020 at 11:43

Abdel Rahman Ayman Hindi

238 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

user1441004 · Accepted Answer · 2020-03-04 22:54:17Z

0

PANDAS is tripping up on handling Unicode data, presumably in generating a CSV output file.

One approach, if you don't really need to process Unicode data, is to simply make conversions on your data to get everything ASCII.

Another approach is to make a pass on your data prior to generating the CSV output file to get the UTF-8 encoding of any non-ASCII characters. (You may need to do this at the cell level of your spreadsheet data.)

I'm assuming Python3 here...

>>> s = "one, two, three, \u2026"
>>> print(s)
one, two, three, …
>>> ascii = str(s.encode("utf-8"))[2:-1]
>>> ascii
'one, two, three, \\xe2\\x80\\xa6'
>>> print(ascii)
one, two, three, \xe2\x80\xa6

Collectives™ on Stack Overflow

UnicodeEncodeError in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related