5

I have an excel sheet that contains one million rows. Only the first hundred rows or so have data. Remaining rows are empty and blank. pandas.read_excel internally makes use of xlrd to read data. In turn xlrd reads the whole sheet and takes a lot of time(Around 65 seconds). I tried the below code. But could not reduce the reading time.

df= pd.read_excel(file_path, sheetname=sheetname,nrows=1000, skiprows=1, header=None)

I have a 8GB RAM in my machine with Windows 10 OS. I'm using pandas 0.25.3

Is there any other optimal solution to reduce reading time ?

0

1 Answer 1

6

keep_default_na=False parameter may reduce read time and ignore the NaN values in excel file.

Example usage:

df = pd.read_excel('test.xlsx', keep_default_na=False)
Sign up to request clarification or add additional context in comments.

4 Comments

This helps... Got time reduced from 65 secs to 54 secs.....But, I wish to have around 5 to 10 secs....
As far as i know read_csv is working faster than read_excel so if you are able to open excel file and save it as CSV file this might help you.
Yes, I saw a few question posts with conversion to csv solution. I will try them. Thanks
Read_csv is faster. Thanks !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.