5

I am new to Python and I am using it to do some data analysis.

My problem is the following: I have a directory with many subdirectories, each one of which contains a large number of data files.

I already wrote a Python script which, when executed in one of those subdirectories, performs the data analysis and writes it on a output file. The script includes some shell commands that I call using os.system(), so I have to "be" in one of the subdirectories for it to work.

How can I write a function that automatically:

  1. Moves into the first subdirectory
  2. Executes the script
  3. Goes back to the parent directory and moves to the next subdirectory

I guess that this could be done in some way using os.walk() but I didn't really understand how it works.

PS I am aware of the existence of this post but it doesn't solve my problem.

PPS Maybe I should point out that my function does not take the directory name as argument. Actually it takes no argument.

1

5 Answers 5

3

To change your working directory in Python you need:

os.chdir(your_path)

You can then recursively run your script.

Example Code:

import os

directory_to_check = "your_dir" # Which directory do you want to start with?

def my_function(directory):
      print("Listing: " + directory)
      print("\t-" + "\n\t-".join(os.listdir("."))) # List current working directory

# Get all the subdirectories of directory_to_check recursively and store them in a list:
directories = [os.path.abspath(x[0]) for x in os.walk(directory_to_check)]
directories.remove(os.path.abspath(directory_to_check)) # If you don't want your main directory included

for i in directories:
      os.chdir(i)         # Change working Directory
      my_function(i)      # Run your function

I don't know how your script works because your question is quite general, so therefore I can only give a general answer....

But I think what you need is:

  1. Get all subdirectories and store them using os.walk
  2. Change your working directory with os.chdir

os.walk alone won't work

I hope this helps! Good luck!

Sign up to request clarification or add additional context in comments.

9 Comments

But this way I get stuck in the first subdirectory at the first iteration and I get "[Errno 2] No such file or directory: subdirectory_name". It should go back in the parent directory after the function is executed...
Yes. That is why I mentioned that you need absolute paths... I updated the code so that it suits your needs :)
Ok, I had to write "file" with quotes to make that line work (otherwise I get "name 'file' is not defined"), but it works! Except for one thing...for some reason the absolute path to the parent directory gets included in the "directories" list. How can I avoid that?
(I meant to write "files" with the underscores but I got bold instead)
what do you use in the directory_to_check variable? If you use '.' (to indicate the current directory) then what you say happens. But, try to run your script one directory above, and use directory_to_check='your_dir' to avoid this... (If I understand the problem correctly...)
|
3

os.walk should work perfectly for what you want to do. Get started with this code and you should see what you need to do:

import os
path = r'C:\mystartingpath'

for (path, dirs, files) in os.walk(path):
    print "Path:", path

    print "\nDirs:"
    for d in dirs:
        print '\t'+d

    print "\nFiles:"
    for f in files:
        print '\t'+f

    print "----"

What this code will do is show you that os.walk will iterate through all subdirectories of your chosen starting path. Once in each directory, you can get the full path to each file name by concatenating the path and the file name. For example:

path_to_intersting_file = path+'\\'+filename

# (This assumes that you saved your filename into a variable called filename)

With the full path to each file, you can perform your analysis while in the os.walk for loop. Add your analysis code so that the for loop is doing more than just printing contents.

Comments

1

This would be done like this.

for dir in os.listdir(your_root_directory):
    yourFunction(dir)

The os.listdir method returns the list of directories in the root directory only.

The os.walk method however traverses the directories recursivelly, which makes it useful for other things and os.listdir might be better.

However, for the sake of completenes, here is a os.walk option:

for dir in next(os.walk(your_directory))[1]:
    yourFunction(dir)

Notice that the os.walk is a generator, hence the next call. The first next call, produces a tuple root, dirs, files. And the root in this case is your directory. You are only interested in dirs - the list of subdirectories, so you index [1].

6 Comments

Maybe I should have pointed this out, but my function doesn't take the directory name as argument. Actually it takes no argument.
Well, it should not be hard to make it so that it does. Otherwise you would need to use globals, which is bad form for python. Making a function take a folder on which it operates as an argument is what modularity is for. So that you can reuse it in other occasions.
Why is using globals bad form for Python?
For his application, the use of globals would be unnecessary. Also, generally globals defeat the purpose of blackbox idea behind programming. Even if doing OOP, functional programming still applies to many methods. Therefore it is a bad practice to use globals, unless in cases where you absolutely need them. But those cases are rare. They also make debugging a lot harder.
This link explains it better and in more depth: stackoverflow.com/questions/19158339/…
|
0

If you want to do a certain action for every sub-folder of a folder, one way is to write a recursive function, processing each directory one at a time. I hope my example helps a little bit: http://pastebin.com/8G7JzcQ2

1 Comment

Please add the code to your answer. See How do I format my code blocks?
0

I was doing something similar, cd into every subdirectory and run git commands, etc. Shortened version

import os
import pathlib
import subprocess

if __name__ == "__main__":
    # dir path of a script, subdirectories are here
    ROOT_PATH = os.getcwd()

    # all files, folders in script's directory
    for name in os.listdir(ROOT_PATH):
        dir_path = os.path.abspath(name)

        # if a subdirectory
        if os.path.isdir(dir_path):
            # cd to subdirectory
            os.chdir(dir_path)

            # could run a script subprocess.run(["python", "my_script.py"])
            # or you could run all commands here one by one
            git_log = subprocess.getoutput(['git', 'log', '-n1'])
            print(git_log + "\n")

            # move back to script's dir
            os.chdir(ROOT_PATH)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.