1

I want to loop through a directory and find specific xlsx files and then put them each into separate pandas dataframe. The thing here is that I also want all sheets in those excel files to be in the dataframe.

Below is a sample of code that I implemented, I just need to add the logic to pick all sheets:

import pandas as pd
from glob import glob

path = 'path_to_file'

files = glob(path + '/*file*.xlsx')

get_df = lambda f: pd.read_excel(f)

dodf = {f: get_df(f) for f in files}

dodf[files[2]] --- dictionary of dataframes 
1
  • I have already created a dictionary to access each dataframe. So the current method should be fine, i.e. in a dictionary. Commented Sep 13, 2017 at 19:44

1 Answer 1

1

As described in this answer in Pandas you still have access to the ExcelFile class, which loads the file creating an object.

This object has a .sheet_names property which gives you a list of sheet names in the current file.

xl = pd.ExcelFile('foo.xls')
xl.sheet_names  # list of all sheet names

To actually handle the import of the specific sheet, use .parse(sheet_name) on the object of the imported Excel file:

xl.parse(sheet_name)  # read a specific sheet to DataFrame

For your code something like:

get_df = lambda f: pd.ExcelFile(f)
dodf = {f: get_df(f) for f in files}

...gives you dodf a dictionary of ExcelFile objects.

filename = 'yourfilehere.xlsx'
a_valid_sheet = dodf[filename].sheet_names[0] # First sheet
df = dodf[filename].parse(sheet_name)
Sign up to request clarification or add additional context in comments.

4 Comments

I do not want to manually input the filename. Is there a way to get it from the dictionary dodf that I have created? I am completely new to Python so I do not know as such how it all works.
yep — but you need you change your pd.read_excel(f) to pd.ExcelFile(f). Once that is done, each object will have the .sheet_names attribute which is a list of sheets in that file.
Yes, I already did. But then, I have to individually parse each of the sheet into the dataframe right?
@ManasJani that's right. But you can iterate over the list of sheet_names to do this, e.g. for sheet in your_xls_obj.sheetnames: df = your_xls_obj.parse(sheet)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.