Python Pandas - loop through folder of Excel files, export data from each Excel file's sheet into their own .xlsx file

Question

I have a folder of Excel files, many of which have 3-4 tabs worth of data that I just want as individual Excel files. For example, let's say I have an Excel file with three tabs: "employees", "summary", and "data". I would want this to create 3 new Excel files out of this: employees.xlsx, summary.xlsx, and data.xlsx.

I have code that will loop through a folder and identify all of the tabs, but I have struggling to figure out how to export data individually from each sheet into its own Excel file. I have gotten to the point where I can loop through the folder, open each Excel file, and find the name of each sheet. Here's what I have so far.

import pandas as pd
import os

# filenames
files = os.listdir()    
excel_names = list(filter(lambda f: f.endswith('.xlsx'), files))

excels = [pd.ExcelFile(name, engine='openpyxl') for name in excel_names]
sh = [x.sheet_names for x in excels] # I am getting all of the sheet names here
for s in sh:
    for x in s:
        #there is where I want to start exporting each sheet as its own spreadsheet

#df.to_excel("output.xlsx", header=False, index=False) #I want to eventually export it obviously, this is a placeholder

Does this need to be done using Python? It could easily be done using VBA. — norie
– norie, Commented Mar 11, 2021 at 13:44
In this particular case it does need to be done in Python. VBA might be more practical, but it's a long story involving organizational rules that I will spare you. — tenebris silentio
– tenebris silentio, Commented Mar 11, 2021 at 13:48

It_is_Chris · Accepted Answer · 2021-03-11 14:08:54Z

2

import pandas as pd
import glob

# get the file names using glob 
# (this assumes that the files are in the current working directory)
excel_names = glob.glob('*.xlsx')
# iterate through the excel file names
for excel in excel_names:
    # read the excel file with sheet_name as none
    # this will create a dict
    dfs = pd.read_excel(excel, sheet_name=None)
    # iterate over the dict keys (which is the sheet name)
    for key in dfs.keys():
        # use f-strings (only available in python 3) to assign 
        # the new file name as the sheet_name
        dfs[key].to_excel(f'{key}.xlsx', index=False)

answered Mar 11, 2021 at 14:08

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tenebris silentio Over a year ago

Thank you, I'm getting a xlrd.biffh.XLRDError: Excel xlsx file; not supported error. Is there a way to do this using openpyxl instead of XLRD? I haven't worked with Glob much so I don't know if I can change this setting.

It_is_Chris Over a year ago

@tenebrissilentio what version of pandas are you using? print(pd.__version__) Is the error thrown on the pd.read_excel(...) portion of the code?

tenebris silentio Over a year ago

Apparently too old of a version. I updated Pandas and it worked. I figured during the course of 7 months, updating Pandas couldn't have been the issue, but boy was I wrong. That did it. Thanks so much.

It_is_Chris Over a year ago

@tenebrissilentio You're welcome and good luck.

Collectives™ on Stack Overflow

Python Pandas - loop through folder of Excel files, export data from each Excel file's sheet into their own .xlsx file

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related