Get number of bad lines/errors while reading csv with pandas ( error_bad_lines)

Question

I am reading a csv file in pandas, and I am skipping some bad lines / rows with:

df2 = pd.read_csv("Test.csv", sep=';', engine='python', error_bad_lines=False)

How can I count the total number of skipped rows in python?

Right now, I only get:

How can I count this?

PV8 · Accepted Answer · 2019-09-25 09:32:31Z

4

You could calculate the row difference:

with open("test.csv") as f:
    len_csv = sum(1 for line in f)

number_of_skipped_rows = len_csv - len(df2)

PV8

6,3669 gold badges54 silver badges113 bronze badges

answered Sep 25, 2019 at 9:29

Carsten

3,0781 gold badge18 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Check to make sure your data doesn't have a header, or this could introduce an off-by-one error

raj · Accepted Answer · 2019-09-25 09:47:24Z

2

f = open("Test.csv")
row_count= len(f.readlines())
df2 = pd.read_csv("Test.csv", sep=';', engine='python', error_bad_lines=False)

Count of skipped rows

skipped_rows  = row_count  - df2.shape[0]

answered Sep 25, 2019 at 9:32

raj

3123 silver badges9 bronze badges

this is not working, the command df1 = pd.read_csv("Test.csv", sep=';', engine='python'), while give an error.

name csv is not definied

now it is working, but I must say, I do not need to import another library, so right now, I prefer Carsten solution

now its quite simliar, but the code is also the same approach