Wrong row count for CSV file in python

Question

I am processing a csv file and before that I am getting the row count using the below code.

total_rows=sum(1 for row in open(csv_file,"r",encoding="utf-8"))

The code has been written with the help given in this link. However, the total_rows doesn't match the actual number of rows in the csv file. I have found an alternative to do it but would like to know why is this not working correctly??

In the CSV file, there are cells with huge text and I have to use the encoding to avoid errors reading the csv file.

Any help is appreciated!

When you have cells with huge text, csv parsers such as pandas.read_csv will read it correctly as opposed to open which reads the file line by line, not considering the huge text as one. — Chris
– Chris, Commented Mar 15, 2019 at 8:11
@Chris yes. I have found alternate ways to read the rows correctly. But what is wrong with the above code that was suggested for using to count rows?? — Eswar
– Eswar, Commented Mar 15, 2019 at 8:12
As I said, suppose your huge text is two or more lines. In csv, this is considered as one cell which will be inside a single line. Nevertheless, open doesn't know it should be considered as one cell, and simply returns count of lines. — Chris
– Chris, Commented Mar 15, 2019 at 8:14
How many lines are in your file? Are you sure there is not a mix of \r and \n that might give your a wrong count? or some \n or \r within some cell text? — Allan
– Allan, Commented Mar 15, 2019 at 8:14
I have run your code on a large file and it was providing me the correct output: python count_row.py 1715181568 same result with wc -l — Allan
– Allan, Commented Mar 15, 2019 at 8:17

Eswar · Accepted Answer · 2019-03-15 09:24:53Z

Let's assume you have a csv file in which some cell's a multi-line text.

$ cat example.csv
colA,colB
1,"Hi. This is Line 1.
And this is Line2"

Which, by look of it, has three lines and wc -l agrees:

$ wc -l example.csv
3 example.csv

And so does open with sum:

sum(1 for row in open('./example.csv',"r",encoding="utf-8"))
# 3

But now if you read is with some csv parser such as pandas.read_csv:

import pandas as pd

df = pd.read_csv('./example.csv')
df
   colA                                    colB
0     1  Hi. This is Line 1.\nAnd this is Line2

The other alternative way to fetch the correct number of rows is given below:

with open(csv_file,"r",encoding="utf-8") as f:
     reader = csv.reader(f,delimiter = ",")
     data = list(reader)
     row_count = len(data)

Excluding the header, the csv contains 1 line, which I believe is what you expect. This is because colB's first cell (a.k.a. huge text block) is now properly handled with the quotes wrapping the entire text.

wdudzik · Accepted Answer · 2019-03-15 08:27:30Z

1

I think that the problem in here is because you are not counting rows, but counting newlines (either \r\n in windows or \n in linux). The problem lies when you have a cell with text where you have newline character example:

1, "my huge text\n with many lines\n"
2, "other text"

Your method for data above will return 4 when accutaly there are only 2 rows

Try to use Pandas or other library for reading CSV files. Example:

import pandas as pd
data = pd.read_csv(pathToCsv, sep=',', header=None);
number_of_rows = len(df.index) # or df[0].count()

Note that len(df.index) and df[0].count() are not interchangeable as count excludes NaNs.

answered Mar 15, 2019 at 8:27

wdudzik

1,34418 silver badges25 bronze badges

3 Comments

Eswar Over a year ago

I was able to get the correct number of rows without using pandas but are you suggesting that open function is counting the number of lines in each cell as well?

wdudzik Over a year ago

Yes, because open just reads file. It is not taking into account that this is CSV.

Eswar Over a year ago

Chris summed what you're saying. Thank you.

Collectives™ on Stack Overflow

Wrong row count for CSV file in python

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related