1

I know this type of question is asked all the time. But I am having trouble figuring out the best way to do this.

I wrote a script that reformats a single excel file using pandas. It works great.

Now I want to loop through multiple excel files, preform the same reformat, and place the newly reformatted data from each excel sheet at the bottom, one after another.

I believe the first step is to make a list of all excel files in the directory. There are so many different ways to do this so I am having trouble finding the best way.

Below is the code I currently using to import multiple .xlsx and create a list.

import os
import glob

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

I am not sure if the previous glob code actually created the list that I need.

Then I have trouble understanding where to go from there. The code below fails at pd.ExcelFile(File) I beleive I am missing something....

# create for loop
for File in FileList:
    for x in File:
# Import the excel file and call it xlsx_file
xlsx_file = pd.ExcelFile(File)
xlsx_file
# View the excel files sheet names
xlsx_file.sheet_names
# Load the xlsx files Data sheet as a dataframe
df = xlsx_file.parse('Data',header= None)
# select important rows,
df_NoHeader = df[4:]
#then It does some more reformatting.
'

Any help is greatly appreciated

2
  • Check your indentation. Whitespace and indentation matter in Python. Also you should avoid using File or file for any sort of variable names in python becuase file is a builtin. Commented May 23, 2016 at 17:38
  • Thanks! That was an issue. Commented May 24, 2016 at 11:01

2 Answers 2

4

I solved my problem. Instead of using the glob function I used the os.listdir to read all my excel sheets, loop through each excel file, reformat, then append the final data to the end of the table.

#first create empty appended_data table to store the info.
appended_data = []


for WorkingFile in os.listdir('C:\ExcelFiles'):
     if os.path.isfile(WorkingFile):

        # Import the excel file and call it xlsx_file
        xlsx_file = pd.ExcelFile(WorkingFile)
        # View the excel files sheet names
        xlsx_file.sheet_names
        # Load the xlsx files Data sheet as a dataframe
        df = xlsx_file.parse('sheet1',header= None)

        #.... do so reformating, call finished sheet reformatedDataSheet
        reformatedDataSheet
        appended_data.append(reformatedDataSheet)
appended_data = pd.concat(appended_data)

And thats it, it does everything I wanted.

Sign up to request clarification or add additional context in comments.

Comments

3

you need to change

os.chdir('C:\ExcelWorkbooksFolder')
for FileList in glob.glob('*.xlsx'):
         print(FileList)

to just

os.chdir('C:\ExcelWorkbooksFolder')
FileList = glob.glob('*.xlsx')
print(FileList)

Why does this fix it? glob returns a single list. Since you put for FileList in glob.glob(...), you're going to walk that list one by one and put the result into FileList. At the end of your loop, FileList is a single filename - a single string.

When you do this code:

for File in FileList:
    for x in File:

the first line will assign File to the first character of the last filename (as a string). The second line will assign x to the first (and only) character of File. This is not likely to be a valid filename, so it throws an error.

3 Comments

Ah, yes of course. Thanks. But I guess I am still having trouble understanding the glob function. I spent more time working on the code and used os.listdir to solve my problem. Thanks for your help though!
os.listdir is usually more than enough (and appropriate for your case) - it lists every file in the directory, which you can filter yourself. glob is useful for the times when you'd something like ls */*.xls - i.e. you want to match wildcards, especially as part of a path. you can do that with os.walk but it's harder; glob just returns a flat list of all the matching paths, which is convenient.
Thanks, now I understand!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.