I would like to read into several pandas data frames many sheets of an excel file.
So far I use:
myfile = filename
myfilecomplete = os.path.join(mypath,myfile)
df_data=pd.read_excel(myfilecomplete, sheet_name='DATA',skiprows=4, indexcol=1,usecols="A:i")
There are around 10 sheets to read in the Excel file. So I repeat that last line 10 times adapted for every sheet:
df_data2=pd.read_excel(myfilecomplete, sheet_name='Whatever',skiprows=3, indexcol=1,usecols="A:O")
etc...
Observe how every sheet is read differently (columns and starting row)
Now, the process take quite some time. The excel file is not extremely big (around 3MB) and only around 1/3 of the sheets are headed.
I am trying to find out ways to accelerated this process. waiting for 10 seconds is too much, since this process has to be run continuously by the user.
Any ideas? I thought that with pd.read_excel the code access the disc every time to read a sheet, whereas it seems more logical to load into memory the excel sheet and from there parse the sheets. Would that help? How do you do it?
I am still quite beginner but I hear a lot of times about concurrency and parallel computing, should that help here?
thanks.