0

Brand new to Python and could use some help importing multiple Excel files to separate Pandas dataframes. I have successfully implemented the following code, but of course it imports everything into one frame. I would like to import them into df1, df2, df3, df4, df5, etc.

Anything helps, thank you!

import pandas as pd
import glob


def get_files():
    directory_path = input('Enter directory path: ')
    filenames = glob.glob(directory_path + '/*.xlsx')
    number_of_files = len(filenames)
    df = pd.DataFrame()
    for f in filenames:
        data = pd.read_excel(f, 'Sheet1')
        df = df.append(data)
    print(df)
    print(number_of_files)

get_files()
3
  • Can you clarify what exactly the issue is? Commented Mar 31, 2020 at 2:56
  • Couldn’t you just add them to a list or a dictionary, instead of appending them all to the same DataFrame? Commented Mar 31, 2020 at 3:02
  • I'm sure I could add them to a or a dictionary, but my experience in Python is quite limited. I'm not even sure what the code would look like. Commented Apr 1, 2020 at 2:40

2 Answers 2

2

The easiest way to do that is to use a list. Each element of the list is a dataframe

def get_files():
    directory_path = input('Enter directory path: ')
    filenames = glob.glob(directory_path + '/*.xlsx')
    number_of_files = len(filenames)
    df_list = []
    for f in filenames:
        data = pd.read_excel(f, 'Sheet1')
        df_list.append(data)
    print(df_list)
    print(number_of_files)
    return df_list

get_files()

You can then access your dataframes with df_list[0], df_list[1]...

Sign up to request clarification or add additional context in comments.

Comments

1

Just as another option by Jezrael answer here https://stackoverflow.com/a/52074347/13160821 but modified for your code.

from os.path import basename

def get_files():
    directory_path = input('Enter directory path: ')
    filenames = glob.glob(directory_path + '/*.xlsx')
    number_of_files = len(filenames)

    df_list = {basename(f) : pd.read_excel(f, 'Sheet1') for f in filenames}

    print(number_of_files)
    return df_list

get_files()

Which can then be accessed by the filename eg. dfs['file_name1.xlsx'] or dfs['some_file.xlsx']. You can also do things like splitext to remove the xlsx from the key or use just part of the filename.

1 Comment

Happy to help with splitext, if you need it. Would just need to know what the filenames look like and what part of them you'd like to use for the key.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.