2

I use the following code to read a CSV file:

address=r'C:\Users\ssadangi\Desktop\Lynda Python data analytics\Ch02\02_05\Superstore-Sales.csv'
df=pd.read_csv(address,index_col='Order Date',parse_dates=True)

The code gives me this error:

UnicodeDecodeError                        Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 16: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-28-3fa20db347ab> in <module>()
----> 1 df=pd.read_csv('Superstore-Sales.csv',index_col='Order Date',parse_dates=True)
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    707                     skip_blank_lines=skip_blank_lines)
    708 
--> 709         return _read(filepath_or_buffer, kwds)
    710 
    711     parser_f.__name__ = name
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    453 
    454     try:
--> 455         data = parser.read(nrows)
    456     finally:
    457         parser.close()
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1067                 raise ValueError('skipfooter not supported for iteration')
   1068 
-> 1069         ret = self._engine.read(nrows)
   1070 
   1071         if self.options.get('as_recarray'):
~\Documents\Softwares\Anaconda\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1837     def read(self, nrows=None):
   1838         try:
-> 1839             data = self._reader.read(nrows)
   1840         except StopIteration:
   1841             if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xae in position 16: invalid start byte
5
  • 1
    Without the original data, there is not much people can tell you. Commented May 15, 2018 at 6:25
  • Welcome to Stack Overflow! I have edited your question for readability using formatting codes (please see the editing help for more information on formatting). I've also tried to make the title more friendly, and added the [python] tag -- you should always use the base language tag. Commented May 15, 2018 at 6:32
  • 2
    Consider editing your question and adding the portion of the CSV file that causes the error -- try removing parts of the CSV file to make it as short as possible but still causes the error, then show us that. Commented May 15, 2018 at 6:34
  • 2
    Tre setting an encoding Ex: encoding='latin1' or encoding='iso-8859-1' Commented May 15, 2018 at 6:50
  • 1
    Thanks for editing the question @Cris Luengo. This was my first in Stack Overflow. I tried using the formatting codes, but didn't seem to work (even tagging doesn't seem to work). Some issue with the network or browser. Solved by setting encoding='ansi'. Thanks @Rakesh Commented May 16, 2018 at 10:51

1 Answer 1

1

This is encoding error when you use pd.read_csv() function you have to define encoding as well

address=r'C:\Users\ssadangi\Desktop\Lynda Python data analytics\Ch02\02_05\Superstore-Sales.csv'
df=pd.read_csv(address,index_col='Order Date',parse_dates=True,encoding='latin1') #here i am using encoding attribute
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.