How to avoid reading empty rows in pandas.read_excel

Question

I have an excel sheet that contains one million rows. Only the first hundred rows or so have data. Remaining rows are empty and blank. pandas.read_excel internally makes use of xlrd to read data. In turn xlrd reads the whole sheet and takes a lot of time(Around 65 seconds). I tried the below code. But could not reduce the reading time.

df= pd.read_excel(file_path, sheetname=sheetname,nrows=1000, skiprows=1, header=None)

I have a 8GB RAM in my machine with Windows 10 OS. I'm using pandas 0.25.3

Is there any other optimal solution to reduce reading time ?

talatccan · Accepted Answer · 2019-12-05 13:47:56Z

6

keep_default_na=False parameter may reduce read time and ignore the NaN values in excel file.

Example usage:

df = pd.read_excel('test.xlsx', keep_default_na=False)

answered Dec 5, 2019 at 13:47

talatccan

7415 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Darak Dir Over a year ago

This helps... Got time reduced from 65 secs to 54 secs.....But, I wish to have around 5 to 10 secs....

talatccan Over a year ago

As far as i know read_csv is working faster than read_excel so if you are able to open excel file and save it as CSV file this might help you.

Darak Dir Over a year ago

Yes, I saw a few question posts with conversion to csv solution. I will try them. Thanks

Darak Dir Over a year ago

Read_csv is faster. Thanks !

Collectives™ on Stack Overflow

How to avoid reading empty rows in pandas.read_excel

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related