I cannot read file this CSV file using pd.read_csv with different number of expected values

Question

I have been trying for a few hours to read this file. I have tried researching solutions and applying them. THey did not work. The file itself opens fine on Excel, but I cannot read it with Pandas.

The response keeps returning the same error: ParserError: Expected 3 fields in line 5, saw 63

I have seen a few other questions on this topic, but none of the solutions to those specific questions has solved my issue.

Does anyone know why I am failing to read this file and how I can fix it? Thank you

IN:

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv',
                 header=None,
                 engine='python',
                error_bad_lines=True)

OUT:

ParserError                               Traceback (most recent call last)
<ipython-input-96-0d42116a039d> in <module>()
      2                  header=None,
      3                  engine='python',
----> 4                 error_bad_lines=True)

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    444 
    445     try:
--> 446         data = parser.read(nrows)
    447     finally:
    448         parser.close()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1034                 raise ValueError('skipfooter not supported for iteration')
   1035 
-> 1036         ret = self._engine.read(nrows)
   1037 
   1038         # May alter columns / col_dict

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, rows)
   2264             content = content[1:]
   2265 
-> 2266         alldata = self._rows_to_cols(content)
   2267         data = self._exclude_implicit_index(alldata)
   2268 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _rows_to_cols(self, content)
   2907                     msg += '. ' + reason
   2908 
-> 2909                 self._alert_malformed(msg, row_num + 1)
   2910 
   2911         # see gh-13320

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _alert_malformed(self, msg, row_num)
   2674 
   2675         if self.error_bad_lines:
-> 2676             raise ParserError(msg)
   2677         elif self.warn_bad_lines:
   2678             base = 'Skipping line {row_num}: '.format(row_num=row_num)

ParserError: Expected 3 fields in line 5, saw 63

Here is a sample of the CSV file:

"Country_Name","Country_Code","Indicator_Name","Indicator_Code","1960","1961","1962","1963","1964","1965","1966","1967","1968","1969","1970","1971","1972","1973","1974","1975","1976","1977","1978","1979","1980","1981","1982","1983","1984","1985","1986","1987","1988","1989","1990","1991","1992","1993","1994","1995","1996","1997","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013","2014","2015","2016","2017",
"Aruba","ABW","CO2 emissions (metric tons per capita)","EN.ATM.CO2E.PC","","","","","","","","","","","","","","","","","","","","","","","","","","","2.86831939212055","7.23519803341258","10.0261792105306","10.6347325992922","26.3745032100275","26.0461298009966","21.4425588041328","22.000786163522","21.0362451108214","20.7719361585578","20.3183533653846","20.4268177083943","20.5876691453648","20.311566765912","26.1948752380219","25.9340244138733","25.6711617820448","26.4204520857169","26.5172934158421","27.200707780588","26.9482604728658","27.8955739972338","26.2308466448946","25.9158329472761","24.6705288731078","24.5058352032767","13.1555416906324","8.35129425218293","8.408362637892","","","",

Try with data = pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv', error_bad_lines=False) — Riccardo
– Riccardo, Commented Nov 1, 2018 at 17:53
Are you sure you don't have a header that spans multiple rows that you need to skip? The parser will tokenize data based on the first row, so if that contains more or less fields than the rest of the file it wont parse correctly. The skiprows argument will help with this. — ALollz
– ALollz, Commented Nov 1, 2018 at 18:02

Niels Henkens · Accepted Answer · 2018-11-01 18:04:11Z

1

Changing your code to

data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv', header=None, engine='python', error_bad_lines=False)

will import your csv, but wont correctly import your csv. Probably there is something with your csv and the separator used. Could you post the 5th line of the csv you are trying to import? Does the last column for example contain text with comma's? How many columns do you expect: 3, 63, or something else?

answered Nov 1, 2018 at 18:04

Niels Henkens

2,7161 gold badge14 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Tarquin Tate Over a year ago

This is the 5th line of the CSV file:

Tarquin Tate Over a year ago

I tried the 'error_bad_lines=False' , it did not work before. But, I tried that line of code and it skipped every line in the file. Each line skipped says: "Expected 3 fields in line 21, saw 63"

Niels Henkens Over a year ago

So, looking at your sample data, you definately have more then 3 colomns... I count 62, but the trailing comma will make it 63. Is the header row realy the first line of the csv?

Tarquin Tate Over a year ago

You're right! The header was not the first line. In the CSV file, I deleted all the rows above the header and the file read correctly in Python. Thank you

Niels Henkens Over a year ago

You can also do this in the pd.read_csv() with the skiprows argumunt. In this case probably data=pd.read_csv('API_EN.ATM.CO2E.PC_DS2_en_csv_v2_10181020.csv’, skiprows=4) (or 5, I’m not sure if the error you got is 0-indexed or not).

Destiny · Accepted Answer · 2018-11-01 17:53:11Z

0

Try changing the value of sep parameter in pd.read_csv().

answered Nov 1, 2018 at 17:53

Destiny

1418 bronze badges

Collectives™ on Stack Overflow

I cannot read file this CSV file using pd.read_csv with different number of expected values

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related