2

I would like to read into several pandas data frames many sheets of an excel file.

So far I use:

myfile           = filename
myfilecomplete = os.path.join(mypath,myfile)
df_data=pd.read_excel(myfilecomplete, sheet_name='DATA',skiprows=4, indexcol=1,usecols="A:i")

There are around 10 sheets to read in the Excel file. So I repeat that last line 10 times adapted for every sheet:

df_data2=pd.read_excel(myfilecomplete, sheet_name='Whatever',skiprows=3, indexcol=1,usecols="A:O")

etc...

Observe how every sheet is read differently (columns and starting row)

Now, the process take quite some time. The excel file is not extremely big (around 3MB) and only around 1/3 of the sheets are headed.

I am trying to find out ways to accelerated this process. waiting for 10 seconds is too much, since this process has to be run continuously by the user.

Any ideas? I thought that with pd.read_excel the code access the disc every time to read a sheet, whereas it seems more logical to load into memory the excel sheet and from there parse the sheets. Would that help? How do you do it?

I am still quite beginner but I hear a lot of times about concurrency and parallel computing, should that help here?

thanks.

1 Answer 1

2

You can read the entire file in once with ExcelFile and then read the individual sheets from that.

xlFile = pd.ExcelFile(myfilecomplete)
df_data = pd.read_excel(xlFile, sheet_name='DATA',skiprows=4, indexcol=1,usecols="A:i")
df_data2 = pd.read_excel(xlFile, sheet_name='Whatever',skiprows=3, indexcol=1,usecols="A:O")
Sign up to request clarification or add additional context in comments.

1 Comment

Great. my way = 40 seconds, Your way = 8 seconds

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.