0

Hello I have data in several excel sheets spread through different subfolders so far I 've been able to write a code that extracts the columns that needed and save them in a dictionary here's the code :

 import os
 import pandas as pd

#Path to file using os

FOLDER_PATH = r'C:\Users\Sarah\Desktop\test'

def listDir(dir):
filenames = os.listdir(dir)
for filename in filenames:
    print('File Name:'+ filename)
    print('folder Path:'+ os.path.abspath(os.path.join(dir, filename)), sep='\n')
listDir(FOLDER_PATH)

#Display sheets names using pandas

pd.set_option('display.width',300)
mosul_file = (r'C:\Users\Sarah\Desktop\test\Months\March.xlsx')
mosul_file2 =(r'C:\Users\Sarah\Desktop\test\Months\April.xlsx')
mosul_file3 =(r'C:\Users\Sarah\Desktop\test\Months\May.xlsx')
mosul_file7 =(r'C:\Users\Sarah\Desktop\test\Months\July.xlsx')
xl = pd.ExcelFile(mosul_file)
xl2 = pd.ExcelFile(mosul_file2)
xl3 = pd.ExcelFile(mosul_file3)
xl7 = pd.ExcelFile(mosul_file7)


 #Display headers index

 mosul_df = xl.parse(0, header=[1], index_col=[0,1,2])
 mosul_df2 = xl2.parse(0, header=[0], index_col=[0,1,2])
 mosul_df3 = xl3.parse(0, header=[0], index_col=[0,1,2])
 mosul_df7 = xl7.parse(1, header=[0], index_col=[0,1,2])


#Read Excel and Select columns

mosul_file = pd.read_excel(r'C:\Users\Sarah\Desktop\test\Months\March.xlsx', sheet_name = 0 , 
index_clo=None, na_values= ['NA'], usecols = "C , F ,G")
mosul_file2 = pd.read_excel(r'C:\Users\Sarah\Desktop\test\Months\April.xlsx', sheet_name = 0 , 
index_clo=None, na_values= ['NA'], usecols = "C , F , G")
mosul_file3 = pd.read_excel(r'C:\Users\Sarah\Desktop\test\Months\May.xlsx', sheet_name = 0 , 
index_clo=None, na_values= ['NA'], usecols = "C , F , G")
mosul_file7 = pd.read_excel(r'C:\Users\Sarah\Desktop\test\Months\July.xlsx', sheet_name = 0 , 
index_clo=None, na_values= ['NA'], usecols = "C, F, G")

#Remove NaN values

data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
data_mosul_df = mosul_file.dropna()
data_mosul_df2 = mosul_file2.apply (pd.to_numeric, errors='coerce')
data_mosul_df2 = mosul_file2.dropna()
data_mosul_df3 = mosul_file3.apply (pd.to_numeric, errors='coerce')
data_mosul_df3 = mosul_file3.dropna()
data_mosul_df7 = mosul_file3.apply (pd.to_numeric, errors='coerce')
data_mosul_df7 = mosul_file7.dropna()

#Save to Dictionary

datamosul1 = data_mosul_df.to_dict()
datamosul2 = data_mosul_df2.to_dict()
datamosul3 = data_mosul_df3.to_dict()
datamosul7 = data_mosul_df7.to_dict()

How to make it automatic so that it loops through all the folders and subfolders? Thank you

1
  • have a read of this it uses pathlib and some error handling to grab multiple workbooks & spreadsheets. Commented Jan 13, 2020 at 11:32

2 Answers 2

1
from os import walk
import pandas as pd 

path = './Results'
my_files = []
for (dirpath, dirnames, filenames) in walk(path):
    my_files.extend(filenames)

print(my_files)

all_dicts_list = []
for file_name in my_files:
   #.....

    #Read Excel and Select columns

    mosul_file = pd.read_excel(file_name, sheet_name = 0 , 
    index_clo=None, na_values= ['NA'], usecols = "C , F ,G")

    #Remove NaN values

    data_mosul_df = mosul_file.apply (pd.to_numeric, errors='coerce')
    data_mosul_df = mosul_file.dropna()

    #Save to Dictionary

    datamosul1 = data_mosul_df.to_dict()
    all_dicts_list.append(datamosul1)


#all dictionaries will be in all_dicts_list
Sign up to request clarification or add additional context in comments.

1 Comment

Hi thanks again, so the first part is working it prints the files however it seems that there is a problem with the loop. the Message error is "FileNotFoundError: [Errno 2] No such file or directory"
0

If I understand you correctly, you want to get all the filenames from folders and subfolders. I hope the following code works for you, please set the path to your root folder.

from os import walk

path = './test'
my_files = []
for (dirpath, dirnames, filenames) in walk(path):
    my_files.extend(filenames)

print(my_files)

1 Comment

Thank you , sorry my question was confusing actually what I mean is how to make from step 2 : (#Display sheets names using pandas) to Step 5 (#Save to Dictionary) automatic steps because for now, i am just copy/pasting files path. Thank you !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.