Reading variable number of columns in pandas

Question

I have a poorly formatted delimited file, in which the there are errors with the delimiter, so it sometimes appears that there are an inconsistent number of columns in different rows.

When I run

pd.read_csv('patentHeader.txt', sep="|", header=0)

the process dies with this error:

CParserError: Error tokenizing data. C error: Expected 9 fields in line 1034558, saw 15

Is there a way to have pandas skip these lines and continuing? Or put differently, is there some way to make read_csv be more flexible about how many columns it encounters?

by default header=0 so you don't need this param unless your intention is that you don't have a header in which case it should be header=None — EdChum
– EdChum, Commented Jun 25, 2015 at 7:46

Jianxun Li · Accepted Answer · 2015-06-24 22:04:48Z

2

Try this.

pd.read_csv('patentHeader.txt', sep="|", header=0, error_bad_lines=False)

error_bad_lines: if False then any lines causing an error will be skipped bad lines, and it will be reported once the reading process is done.

answered Jun 24, 2015 at 22:04

Jianxun Li

24.9k10 gold badges64 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Reading variable number of columns in pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related