1

I have a poorly formatted delimited file, in which the there are errors with the delimiter, so it sometimes appears that there are an inconsistent number of columns in different rows.

When I run

pd.read_csv('patentHeader.txt', sep="|", header=0)

the process dies with this error:

CParserError: Error tokenizing data. C error: Expected 9 fields in line 1034558, saw 15

Is there a way to have pandas skip these lines and continuing? Or put differently, is there some way to make read_csv be more flexible about how many columns it encounters?

1
  • 1
    by default header=0 so you don't need this param unless your intention is that you don't have a header in which case it should be header=None Commented Jun 25, 2015 at 7:46

1 Answer 1

2

Try this.

pd.read_csv('patentHeader.txt', sep="|", header=0, error_bad_lines=False)

error_bad_lines: if False then any lines causing an error will be skipped bad lines, and it will be reported once the reading process is done.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.