ParseError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file. (read_csv)

Question

I cannot use read_csv method of pandas properly on kaggle. Error that I get is:

ParseError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

I found some suggestions about this (read_excel, read by column). However, they do not help me to solve this error.

Searching on this error message gives many good options such as a way to find exactly which line in the csv is bad, switching to the python engine, or defining a different line termination character (\n instead of \r\n). see stackoverflow.com/questions/33998740/… — tdelaney
– tdelaney, Commented Nov 10, 2020 at 5:22
I am using donald_trump CSV that in the kaggle.com/manchunhui/us-election-2020-tweets notebook. Verified answer of topic that you've send me is not good solution and it does not have enough description to make sense. — Ahmet Onur Solmaz
– Ahmet Onur Solmaz, Commented Nov 10, 2020 at 5:32
I found good solutation to fix this problem. Adding engine='python' to read_csv method as a parameter. pd.read_csv('csv_path', engine='python'). Ok - how does this solve a problem? — Ahmet Onur Solmaz
– Ahmet Onur Solmaz, Commented Nov 10, 2020 at 5:37
I don't know in this case, but CSV is generally loosely defined - some encoders may miss certain escapes that cause other parsers problems. Normally a CSV has \r\n line endings but if a lone \r is floating around in there, or there are other anomolies, some parsers will choke. Pandas normally uses a C parser that is not too forgiving. But it can also use a python one that will handle more cases. I can't say for sure what is messed up in this case. — tdelaney
– tdelaney, Commented Nov 10, 2020 at 5:43
Hmm, thanks. I understood the case. I did not know what pandas uses C engine to read csv. Otherwise, Using python engine enables us to scan and read csv(maybe excel) more flexible. — Ahmet Onur Solmaz
– Ahmet Onur Solmaz, Commented Nov 10, 2020 at 5:50

alemol · Accepted Answer · 2022-04-26 22:15:41Z

1

I solved the same problem just by adding engine='python':

df = pd.read_csv(fname, sep='\t',
                 engine='python',
                 header=None)

answered Apr 26, 2022 at 22:15

alemol

8,7482 gold badges28 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

ParseError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file. (read_csv)

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related