2

I need some help. I am using the following code:

%matplotlib inline
import csv
from datetime import datetime
import numpy as np
import pandas as pd
from IPython.display import display
with open (r'C:\Users\Michel Spiero\Desktop\Analise Python Optitex\Analytics Optitex\base_entrada_python_v2.csv') as csvfile:
    readCSV =csv.reader(csvfile, delimiter=';')

entrada_arquivo = pd.read_csv(r'C:\Users\Michel Spiero\Desktop\Analise Python Optitex\Analytics Optitex\base_entrada_python_v2.csv')
entrada_arquivo.head(10)

Then I get this error:

ParserError                               Traceback (most recent call last)
<ipython-input-2-248d3ffc3e4b> in <module>()
      3     readCSV =csv.reader(csvfile, delimiter=';')
      4 
----> 5 entrada_arquivo = pd.read_csv(r'C:\Users\Michel Spiero\Desktop\Analise Python Optitex\Analytics Optitex\base_entrada_python_v2.csv')
      6 entrada_arquivo.head(10)
      7 

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654 
--> 655         return _read(filepath_or_buffer, kwds)
    656 
    657     parser_f.__name__ = name

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    409 
    410     try:
--> 411         data = parser.read(nrows)
    412     finally:
    413         parser.close()

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
    980                 raise ValueError('skipfooter not supported for iteration')
    981 
--> 982         ret = self._engine.read(nrows)
    983 
    984         if self.options.get('as_recarray'):

C:\Users\Michel Spiero\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1717     def read(self, nrows=None):
   1718         try:
-> 1719             data = self._reader.read(nrows)
   1720         except StopIteration:
   1721             if self._first_chunk:

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)()

ParserError: Error tokenizing data. C error: Expected 9 fields in line 3, saw 11

My goal is to get this CSV file (which is separated by ;) and import it in to a data frame.

It is important to say that the numbers are defined in the portuguese format so, the decimals are separated by comma and not by a dot.

Can someone help me? It is a basic question but I am sutck.

1
  • Was your question addressed? Can you please close it and accept an answer if it was? Thanks. Commented Sep 12, 2017 at 0:43

1 Answer 1

2

Your CSV file needs a little parsing to handle the quotes. A regex separator followed by a dropna operation should do it.

path = r'C:\Users\Michel Spiero\Desktop\Analise Python Optitex\Analytics Optitex\base_entrada_python_v2.csv'

with open(path, 'r', encoding='utf-8') as f:
    entrada_arquivo = pd.read_csv(f, sep=';|"', engine='python')\
                                               .dropna(how='all', axis=1)

entrada_arquivo.head(5)

   Cliente  Numero           N Fantasia  Serie Docto.  Loja  Data Saida  \
0     1293   47367                  NaN             1     1     42009.0   
1     1293   47367                  NaN             1     1     42009.0   
2    15043   47368  OTICA DE RESPLENDOR             1     1     42010.0   
3    15043   47368  OTICA DE RESPLENDOR             1     1     42010.0   
4    15043   47368  OTICA DE RESPLENDOR             1     1     42010.0   

                                      Nome  DT Emissao Tipo da nota  \
0  DUBLATEX MC COM DE ART VIAG E CAL LTDAE       42009            B   
1  DUBLATEX MC COM DE ART VIAG E CAL LTDAE       42009            B   
2            FRANCISMAR CORREA LOURENCO ME       42009            N   
3            FRANCISMAR CORREA LOURENCO ME       42009            N   
4            FRANCISMAR CORREA LOURENCO ME       42009            N   

   Cond. Pagto   ...      Total Vendedor 1.1 Vendedor 2  Data Saida.1  \
0            1   ...     2204,1          NaN        NaN       42009.0   
1            1   ...    1598,42          NaN        NaN       42009.0   
2          322   ...      173,8         65.0        NaN       42010.0   
3          322   ...     245,85         65.0        NaN       42010.0   
4          322   ...      491,7         65.0        NaN       42010.0   

   Vlr.Bruto Vlr.ICMS Estado.1 Cond. Pagto.1 Volume 1 Transp.  
0    3802,52        0       SP             1        1      43  
1    3802,52        0       SP             1        1      43  
2    3638,02   397,58       MG           322        6       5  
3    3638,02   397,58       MG           322        6       5  
4    3638,02   397,58       MG           322        6       5  

[5 rows x 39 columns]
Sign up to request clarification or add additional context in comments.

3 Comments

Didn,t work. I got this error: UnicodeDecodeError Traceback (most recent call last) .......................... UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 146: invalid start byte
@user1922364 Post a sample of your CSV data in the question.
@user1922364 Your CSV was quoted, making things problematic. Take a look now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.