0

I am looking to recursively search through a folder containing many sub-folders. Some sub-folders contain a specific folder that I want to loop through.

I am familiar with the glob.glob method to find specific files:

import glob, os
from os import listdir
from os.path import isfile, join

os.chdir(pathname) #change directory to path of choice
files = [f for f in glob.glob("filename.filetype") if isfile(join(idir, f))]

Some sub folders in a directory have a time stamp (YYYYMMDD) as their names all containing identical file names. Some of those sub folders contain folders within them with a name, let's call it "A". I'm hoping to create a code that will recursively search for the folder within called "A" within these "specific sub folders". Is there a way to use glob.glob to find these specific sub-folders within a directory?

I am aware of a similar question: How can I search sub-folders using glob.glob module in Python?

but this person seems to be looking for specific files, whereas I am looking for pathnames.

6
  • How would you define specific sub-folders Commented Mar 15, 2017 at 3:30
  • 1
    Possible duplicate of How can I search sub-folders using glob.glob module in Python? Commented Mar 15, 2017 at 3:33
  • Sorry about that, I should have been more clear. Some sub folders in a directory have a time stamp (YYYYMMDD) as their names all containing identical file names. Some of those sub folders contain folders within them with a name, let's call it "A". I'm hoping to create a code that will recursively search for the folder called "A" within these "specific sub folders". Commented Mar 15, 2017 at 3:33
  • 1
    Then you need to try os.walk() Commented Mar 15, 2017 at 3:36
  • Are these folders always at the same level? Suppose it starts 3 subdirs down, you could do glob('*/*/*/*[12][0-9][1-9][0-9][01][0-9][0123][0-9]*/A/') Commented Mar 15, 2017 at 4:29

1 Answer 1

1

You can use os.walk which will walk the tree. Each iteration shows you the directory and its immediate subdirectories, so the test is simple.

import os
import re

# regular expression to match YYYYMMDD timestamps (but not embedded in
# other numbers like 2201703011).
timestamp_check = re.compile(re.compile(r"[^\d]?[12]\d3[01]\d[0123]\d")).search

# Option 1: Stop searching a subtree if pattern is found
A_list = []
for root, dirs, files in os.walk(pathname):
    if timestamp_check(os.path.basename(root)) and 'A' in dirs:
        A_list.append(os.path.join(root, A))
        # inplace modification of `dirs` trims subtree search
        del dirs[:]

# Option 2: Search entire tree, even if matches found
A_list = [os.path.join(root, 'A') 
    for root, dirs, files in os.walk(pathname) 
    if timestamp_check(os.path.basename(root)) and 'A' in dirs]
Sign up to request clarification or add additional context in comments.

2 Comments

This is effectively what the nominated duplicate question's accepted answer amounts to.
@tripleee - they are similar, but OP has more requirements than "*.txt". This answer shows how to filter subdirectories and how to use a regular expression for a more exact match than you can get from fnmatch.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.