3

I need some help from python programmers to solve the issue I'm facing in processing data:-

  • I have .csv files placed in a directory structure like this:-

    -MainDirectory

    • Sub directory 1
      • sub directory 1A
        • fil.csv
    • Sub directory 2
      • sub directory 2A
        • file.csv
    • sub directory 3
      • sub directory 3A
        • file.csv

    Instead of going into each directory and accessing the .csv files, I want to run a script that can combine the data of the all the sub directories.

Each file has the same type of header. And I need to maintain 1 big .csv file with one header only and all the .csv file data can be appended one after the other.

I have the python script that can combine all the files in a single file but only when those files are placed in one folder.

Can you help to provide a script that can handle the above directory structure?

2
  • Since you have got the script that can work if there is only one folder, I think all you need now is fetching all the csv files in the tree, right? Commented Jul 11, 2013 at 6:52
  • yes.....i just need to put them in one single folder but the files under different directories are with the same name. So i need to change the names before I put them in a single folder. And I don't want to manually change the names one by one. Commented Jul 11, 2013 at 7:13

3 Answers 3

3

Try this code, I tested it on my laptop,it works well!

import sys
import os

def mergeCSV(srcDir,destCSV):
    with open(destCSV,'w') as destFile:
        header=''
        for root,dirs,files in os.walk(srcDir):
            for f in files:
                if f.endswith(".csv"):
                    with open(os.path.join(root,f),'r') as csvfile:
                        if header=='':
                            header=csvfile.readline()
                            destFile.write(header)
                        else:
                            csvfile.readline()
                        for line in csvfile:
                            destFile.write(line)          

if __name__ == '__main__':
    mergeCSV('D:/csv','D:/csv/merged.csv')
Sign up to request clarification or add additional context in comments.

Comments

0

You don't have to put all the files in one folder. When you do something with the files, all you need is the path to the file. So gathering all the csv files' paths and the perform the combination.

    import os 
    csvfiles = []
    def Test1(rootDir):
        list_dirs = os.walk(rootDir) 
        for root, dirs, files in list_dirs:      
            for f in files:
                if f.endswith('.csv'):
                    csvfiles.append(os.path.join(root, f))

2 Comments

In my directory structure, I have many sub directories ...so would it able to find the .csv files? Also, each .csv file has same type of header. so When I append them, I don't want to append the header of second csv file at the end of the first file. All I need is 1 header.
The function of can gather all the csv files under the root dir, even in the subdirectories. And as for combination, you said you have got a script. I think you can use that script and just make a small modification.@user2159674
0

you can use os.listdir() to get list of files in directory

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.