0

I'm trying to create a data ingestion routine to load data from multiple excel files with multiple tabs and columns in the pandas data frame. The structuring of the tabs in each of the excel files is the same. Each tab of the excel file should be a separate data frame. As of now, I have created a list of data frames for each excel file that holds all the data from all the tabs as it is concatenated. But, I'm trying to find a way to access each excel from a data structure and each tab of that excel file as a separate data frame. Below mentioned is the current code. Any improvisation would be appreciated!! Please let me know if anything else is needed.

#Assigning the path to the folder variable
folder = 'specified_path'

#Getting the list of files from the assigned path
excel_files = [file for file in os.listdir(folder)]

list_of_dfs = []
for file in excel_files :
    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)
    df['excelfile_name'] = file.split('.')[0]
    list_of_dfs.append(df)

2 Answers 2

1

I would propose to change the line

    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None), ignore_index=True)

to

    df = pd.concat(pd.read_excel(folder + "\\" + file, sheet_name=None))
    df.index = df.index.get_level_values(0)
    df.reset_index().rename({'index':'Tab'}, axis=1)
Sign up to request clarification or add additional context in comments.

3 Comments

This code change is not working as per the expected results. It prints the data from all tabs in one data frame vertically.
Exactly, but it has a separate column for the tab name and for the excel file. So having such table, you may easily filter the necessary tabs of their combinations.
That is true. I used the method above as I’m more comfortable with it and the data is very huge. But, thanks for the help. Appreciated!!
1

To create a separate dataframe for each tab (with duplicated content) in an Excel file, one could iterate over index level 0 values and index with it:

df = pd.concat(pd.read_excel(filename, sheet_name=None))
list_of_dfs = []
for tab in df.index.get_level_values(0).unique():
    tab_df = df.loc[tab]
    list_of_dfs.append(tab_df)

For illustration, here is the dataframe content after reading an Excel file with 3 tabs: full dataframe

After running the above code, here is the content of list_of_dfs:

[        Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4,
         Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4,
         Date  Reviewed  Adjusted
 0 2022-07-11        43        20
 1 2022-07-18        16         8
 2 2022-07-25         8         3
 3 2022-08-01        17         3
 4 2022-08-15        14         6
 5 2022-08-22        12         5
 6 2022-08-29         8         4]

2 Comments

This solution seems to be working partially. I'm getting all the tabs as different data frames which I can access from one single list. But, I'm only able to access the data for the last excel file that was parsed, not all of them. Also, it concatenates all the columns from all the tabs for each data frame and shows the value as NAN for columns that were not in that particular tab.
I have resolved the issue of getting all the excel files in one list. I just initiated a new list before the first for-loop, and then I just need to append the one where we're storing the data from each tab. The only issue which remains now is removing the unnecessary columns from the data frames for tabs where I'm getting NAN values.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.