0

I have been trying for a few hours to read this file. I have tried researching solutions and applying them. THey did not work. The file itself opens fine on Excel, but I cannot read it with Pandas.

The response keeps returning the same error: ParserError: Expected 3 fields in line 5, saw 63

I have seen a few other questions on this topic, but none of the solutions to those specific questions has solved my issue.

Does anyone know why I am failing to read this file and how I can fix it? Thank you

IN:

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv',
                 header=None,
                 engine='python',
                error_bad_lines=True)

OUT:

ParserError                               Traceback (most recent call last)
<ipython-input-96-0d42116a039d> in <module>()
      2                  header=None,
      3                  engine='python',
----> 4                 error_bad_lines=True)

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    444 
    445     try:
--> 446         data = parser.read(nrows)
    447     finally:
    448         parser.close()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1034                 raise ValueError('skipfooter not supported for iteration')
   1035 
-> 1036         ret = self._engine.read(nrows)
   1037 
   1038         # May alter columns / col_dict

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, rows)
   2264             content = content[1:]
   2265 
-> 2266         alldata = self._rows_to_cols(content)
   2267         data = self._exclude_implicit_index(alldata)
   2268 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _rows_to_cols(self, content)
   2907                     msg += '. ' + reason
   2908 
-> 2909                 self._alert_malformed(msg, row_num + 1)
   2910 
   2911         # see gh-13320

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _alert_malformed(self, msg, row_num)
   2674 
   2675         if self.error_bad_lines:
-> 2676             raise ParserError(msg)
   2677         elif self.warn_bad_lines:
   2678             base = 'Skipping line {row_num}: '.format(row_num=row_num)

ParserError: Expected 3 fields in line 5, saw 63

Here is a sample of the CSV file:

"Country_Name","Country_Code","Indicator_Name","Indicator_Code","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017",
"Aruba","ABW","CO2 emissions (metric tons per capita)","EN.ATM.CO2E.PC","","","","","","","","","","","","","","","","","","","","","","","","","","","2.86831939212055","7.23519803341258","10.0261792105306","10.6347325992922","26.3745032100275","26.0461298009966","21.4425588041328","22.000786163522","21.0362451108214","20.7719361585578","20.3183533653846","20.4268177083943","20.5876691453648","20.311566765912","26.1948752380219","25.9340244138733","25.6711617820448","26.4204520857169","26.5172934158421","27.200707780588","26.9482604728658","27.8955739972338","26.2308466448946","25.9158329472761","24.6705288731078","24.5058352032767","13.1555416906324","8.35129425218293","8.408362637892","","","",
3
  • Try with data = pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv', error_bad_lines=False) Commented Nov 1, 2018 at 17:53
  • 1
    Share a sample of the csv you are trying to read. Commented Nov 1, 2018 at 17:59
  • Are you sure you don't have a header that spans multiple rows that you need to skip? The parser will tokenize data based on the first row, so if that contains more or less fields than the rest of the file it wont parse correctly. The skiprows argument will help with this. Commented Nov 1, 2018 at 18:02

2 Answers 2

1

Changing your code to

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv', header=None, engine='python', error_bad_lines=False)

will import your csv, but wont correctly import your csv. Probably there is something with your csv and the separator used. Could you post the 5th line of the csv you are trying to import? Does the last column for example contain text with comma's? How many columns do you expect: 3, 63, or something else?

Sign up to request clarification or add additional context in comments.

5 Comments

This is the 5th line of the CSV file:
I tried the 'error_bad_lines=False' , it did not work before. But, I tried that line of code and it skipped every line in the file. Each line skipped says: "Expected 3 fields in line 21, saw 63"
So, looking at your sample data, you definately have more then 3 colomns... I count 62, but the trailing comma will make it 63. Is the header row realy the first line of the csv?
You're right! The header was not the first line. In the CSV file, I deleted all the rows above the header and the file read correctly in Python. Thank you
You can also do this in the pd.read_csv() with the skiprows argumunt. In this case probably data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv’, skiprows=4) (or 5, I’m not sure if the error you got is 0-indexed or not).
0

Try changing the value of sep parameter in pd.read_csv().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.