UnicodeDecodeError reading a CSV file

Question

I use the following code to read a CSV file:

address=r'C:\Users\ssadangi\Desktop\Lynda Python data analytics\Ch02\02_05\Superstore-Sales.csv'
df=pd.read_csv(address,index_col='Order Date',parse_dates=True)

The code gives me this error:

UnicodeDecodeError                        Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 16: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-28-3fa20db347ab> in <module>()
----> 1 df=pd.read_csv('Superstore-Sales.csv',index_col='Order Date',parse_dates=True)
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    453 
    454     try:
--> 455         data = parser.read(nrows)
    456     finally:
    457         parser.close()
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1067                 raise ValueError('skipfooter not supported for iteration')
   1068 
-> 1069         ret = self._engine.read(nrows)
   1070 
   1071         if self.options.get('as_recarray'):
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1837     def read(self, nrows=None):
   1838         try:
-> 1839             data = self._reader.read(nrows)
   1840         except StopIteration:
   1841             if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 16: invalid start byte

Without the original data, there is not much people can tell you. — user2722968
– user2722968, Commented May 15, 2018 at 6:25
Welcome to Stack Overflow! I have edited your question for readability using formatting codes (please see the editing help for more information on formatting). I've also tried to make the title more friendly, and added the [python] tag -- you should always use the base language tag. — Cris Luengo
– Cris Luengo, Commented May 15, 2018 at 6:32
Consider editing your question and adding the portion of the CSV file that causes the error -- try removing parts of the CSV file to make it as short as possible but still causes the error, then show us that. — Cris Luengo
– Cris Luengo, Commented May 15, 2018 at 6:34
Tre setting an encoding Ex: encoding='latin1' or encoding='iso-8859-1' — Rakesh
– Rakesh, Commented May 15, 2018 at 6:50
Thanks for editing the question @Cris Luengo. This was my first in Stack Overflow. I tried using the formatting codes, but didn't seem to work (even tagging doesn't seem to work). Some issue with the network or browser. Solved by setting encoding='ansi'. Thanks @Rakesh — Siddhant Sadangi
– Siddhant Sadangi, Commented May 16, 2018 at 10:51

Parag Jain · Accepted Answer · 2019-04-06 04:59:40Z

1

This is encoding error when you use pd.read_csv() function you have to define encoding as well

address=r'C:\Users\ssadangi\Desktop\Lynda Python data analytics\Ch02\02_05\Superstore-Sales.csv'
df=pd.read_csv(address,index_col='Order Date',parse_dates=True,encoding='latin1') #here i am using encoding attribute

answered Apr 6, 2019 at 4:59

Parag Jain

6622 gold badges14 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

UnicodeDecodeError reading a CSV file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related