1

I have n excel files and I need to sort them according to a column's value. In fact, I need to organize my excel files placed under a specific folder in creating subfolders and each subfolder contains excel files with the same DEPTNAME, knowing that DEPTNAME is a column name and each excel file has m sheets but all the sheets have the same DEPTNAME.

Example: A folder with 4 excel files:

df1= pd.DataFrame({'Last Name':[‘Stark’, ‘Stark’, ‘ Stark’, ‘Stark’],
 'FirstName':['Arya', ,'Arya','Arya','Arya',],
 'DEPTNAME':['Sécu','Sécu','Sécu','Sécu']})

enter image description here

df2= pd.DataFrame({'Last Name':[‘Lannister’, ‘Lannister’, ‘ Lannister’, ‘Lannister’],
 'FirstName':['Cersei', ,'Cersei','Cersei','Cersei',],
 'DEPTNAME':['Auto','Auto','Auto','Auto']})

enter image description here

df3= pd.DataFrame({'Last Name':[‘Snow’, ‘Snow’, ‘ Snow’, ‘Snow’, ‘ Snow’, ‘Snow’],
         'FirstName':['Jon', 'Jon','Jon','Jon','Jon','Jon'],
         'DEPTNAME':['Aero','Aero','Aero','Aero','Aero','Aero']})

enter image description here

df4= pd.DataFrame({'Last Name':[‘Lannister’, ‘Lannister’, ‘ Lannister’, ‘Lannister’],
         'FirstName':['Tyrion', 'Tyrion','Tyrion','Tyrion',],
         'DEPTNAME':['Aero','Aero','Aero','Aero']})

enter image description here

Now I need to create automatically 3 folders: Sécu, Aero and Auto.

Sécu will contain one excel file

Aero will contain two excel files

Auto will contain one excel file

is it doable knowing that my initial folder contains n excel files with multiple sheets?

1
  • 1
    all the data files have the same columns but the length differs Commented Dec 26, 2019 at 15:39

1 Answer 1

3

here is one way which combines all files in a folder and all sheets in each file and then groups on DEPTNAME and the filename + sorts the files in the folder(Note: if same DEPTNAME are in 2 different excel fies, they are saves as 2 different files in the same folder <- as requested):

def myf(folder,files_to_be_created_in_folder):
    """ folder is the path to input files and files_to_be_created_in_folder
         is the path where the directories are to be created"""
    folder = folder
    list_of_files=os.listdir(folder)
    combined_sheets={i[:-5]:pd.concat(pd.read_excel(os.path.join(folder,i),sheet_name=None)
        .values(),sort=False)for i in list_of_files}
    combined_all_files=pd.concat(combined_sheets.values(),keys=combined_sheets.keys())
    d={i:g for i,g in combined_all_files.groupby(['DEPTNAME'
             ,combined_all_files.index.get_level_values(0)])}
    to_create_folder=files_to_be_created_in_folder
    for k,v in d.items():
        newpath=os.path.join(to_create_folder,k[0])
        if not os.path.exists(newpath):
            os.makedirs(newpath)
        v.to_excel(os.path.join(newpath,f"{k[1]}.xlsx"),index=False)

myf(r'C:\path_to_files\test_folder',r'C:\path_to_write\New folder') #replace paths carefully

For testing I have tried t print a folder tree based on this solution which depicts a folder tree:

ptree(r'C:\path_to_files\test_folder')

test_folder/
|-- test_1.xlsx
|-- test_2.xlsx
|-- test_3.xlsx
|-- test_4.xlsx

ptree(r'C:\path_to_write\New folder') #this also has the test folder

New folder/
|-- Aero/
|   |-- test_3.xlsx
|   |-- test_4.xlsx
|-- Auto/
|   |-- test_2.xlsx
|-- Sécu/
|   |-- test_1.xlsx
|-- test_folder/
|   |-- test_1.xlsx
|   |-- test_2.xlsx
|   |-- test_3.xlsx
|   |-- test_4.xlsx
Sign up to request clarification or add additional context in comments.

3 Comments

thank you, it worked like magic but can you please explain to me the utility of this line of code myf(r'C:\path_to_files\test_folder',r'C:\path_to_write\New folder') because my dataframes were already sorted is this directory files_to_be_created_in_folder
@A.khou that line is just to execute the function, since all the code is wrapped under the function myf , which takes 2 parameters, your input folder path and the output folder path, if you delete the output folders and just run myf with the params, you would see the code creating 3 folders in the output directory
if you want the new folders to be created in the same input folder, pass the same folder path as input and output in the function

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.