Python list directory, subdirectory, and files

Question

I'm trying to make a script to list all directories, subdirectories, and files in a given directory.

I tried this:

import sys, os

root = "/home/patate/directory/"
path = os.path.join(root, "targetdirectory")

for r, d, f in os.walk(path):
    for file in f:
        print(os.path.join(root, file))

Unfortunately, it doesn't work properly. I get all the files, but not their complete paths.

For example, if the directory struct would be:

/home/patate/directory/targetdirectory/123/456/789/file.txt

It would print:

/home/patate/directory/targetdirectory/file.txt

I need the first result.

Colonel_Old · Accepted Answer · 2023-06-03 05:22:45Z

409

Use os.path.join to concatenate the directory and file name:

import os

for path, subdirs, files in os.walk(root):
    for name in files:
        print(os.path.join(path, name))

Note the usage of path and not root in the concatenation, since using root would be incorrect.

In Python 3.4, the pathlib module was added for easier path manipulations. So the equivalent to os.path.join would be:

pathlib.PurePath(path, name)

The advantage of pathlib is that you can use a variety of useful methods on paths. If you use the concrete Path variant you can also do actual OS calls through them, like changing into a directory, deleting the path, opening the file it points to and much more.

edited Jun 3, 2023 at 5:22

Colonel_Old

9529 silver badges15 bronze badges

answered May 26, 2010 at 3:46

Eli Bendersky

276k92 gold badges371 silver badges427 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

zappa Over a year ago

this is the one and only useful answer for the many questions that have been asked concerning "how to get all files recursively in python".

Nir Over a year ago

comprehension list: all_files = [os.path.join(path, name) for name in files for path, subdirs, files in os.walk(folder)]

Ehsan Over a year ago

In Python3 use parenthesis for print function print(os.path.join(path, name))

Gianclgar Over a year ago

os.walk(root) would inspect all contents of root. if you want to look within the provided directory as said in the original question you should use os.walk(path)

Ivan Pirog · Accepted Answer · 2022-03-29 18:25:34Z

84

Just in case... Getting all files in the directory and subdirectories matching some pattern (*.py for example):

import os
from fnmatch import fnmatch

root = '/some/directory'
pattern = "*.py"

for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            print(os.path.join(path, name))

edited Mar 29, 2022 at 18:25

answered Nov 4, 2012 at 0:38

Ivan Pirog

3,2061 gold badge20 silver badges8 bronze badges

3 Comments

Ahmad Ismail Over a year ago

In Python3 use parenthesis for print function print(os.path.join(path, name)). You can also use print(pathlib.PurePath(path, name)).

ash17 Over a year ago

same check could be done with simple string .endswith() method ;) fnmatch uses unix-shell wildcards: docs.python.org/3/library/fnmatch.html

Lauloque Over a year ago

if all you need is to check the file extension, personally I'd prefer using if name.endswith(".py"): instead of importing a module.

Peter Mortensen · Accepted Answer · 2023-04-13 07:31:51Z

16

Another option would be using the glob module from the standard library:

import glob

path = "/home/patate/directory/targetdirectory/**"

for path in glob.glob(path, recursive=True):
    print(path)

If you need an iterator you can use iglob as an alternative:

for file in glob.iglob(my_path, recursive=True):
    # ...

edited Apr 13, 2023 at 7:31

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Nov 11, 2021 at 23:25

Rotareti

54.7k24 gold badges122 silver badges115 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2023-11-15 23:39:53Z

14

Here is a one-liner:

import os

[val for sublist in [[os.path.join(i[0], j) for j in i[2]] for i in os.walk('./')] for val in sublist]
# Meta comment to ease selecting text

The outer most val for sublist in ... loop flattens the list to be one dimensional. The j loop collects a list of every file basename and joins it to the current path. Finally, the i loop iterates over all directories and sub directories.

This example uses the hard-coded path ./ in the os.walk(...) call, you can supplement any path string you like.

Note: os.path.expanduser and/or os.path.expandvars can be used for paths strings like ~/

Extending this example:

It’s easy to add in file basename tests and directoryname tests.

For example, testing for *.jpg files:

... for j in i[2] if j.endswith('.jpg')] ...

Additionally, excluding the .git directory:

... for i in os.walk('./') if '.git' not in i[0].split('/')]

edited Nov 15, 2023 at 23:39

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Sep 26, 2014 at 21:03

ThorSummoner

18.6k18 gold badges144 silver badges156 bronze badges

3 Comments

Roman Rdgz Over a year ago

It does work, but to excluve .git directoy you need to check if '.git' is NOT into the path.

Roman Rdgz Over a year ago

Yep. Should be if '.git' not in i[0].split('/')]

ThorSummoner Over a year ago

I would recommend os.walk over a manual dirlisting loop, generators are great, go use them.

Jean-François Fabre · Accepted Answer · 2020-04-25 08:47:59Z

5

A bit simpler one-liner:

import os
from itertools import product, chain

chain.from_iterable([[os.sep.join(w) for w in product([i[0]], i[2])] for i in os.walk(dir)])

edited Apr 25, 2020 at 8:47

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

answered Feb 21, 2018 at 8:44

Daniel

761 silver badge2 bronze badges

1 Comment

Aakash Gupta Over a year ago

how do I list each file ?

Peter Mortensen · Accepted Answer · 2023-04-13 07:32:09Z

4

You can take a look at this sample I made. It uses the os.path.walk function which is deprecated beware. It uses a list to store all the filepaths.

root = "Your root directory"
ex = ".txt"
where_to = "Wherever you wanna write your file to"

def fileWalker(ext, dirname, names):
    '''
    checks files in names'''
    pat = "*" + ext[0]
    for f in names:
        if fnmatch.fnmatch(f, pat):
            ext[1].append(os.path.join(dirname, f))


def writeTo(fList):

    with open(where_to, "w") as f:
        for di_r in fList:
            f.write(di_r + "\n")


if __name__ == '__main__':
    li = []
    os.path.walk(root, fileWalker, [ex, li])

    writeTo(li)

edited Apr 13, 2023 at 7:32

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered May 4, 2013 at 23:02

devsaw

1,0473 gold badges14 silver badges28 bronze badges

Comments

Puddle · Accepted Answer · 2023-05-09 01:44:39Z

Since every example here is just using walk (with join), I'd like to show a nice example and comparison with listdir:

import os, time

def listFiles1(root): # listdir
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0)+"/"; items = os.listdir(folder) # items = folders + files
        for i in items: i=folder+i; (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles2(root): # listdir/join (takes ~1.4x as long) (and uses '\\' instead)
    allFiles = []; walk = [root]
    while walk:
        folder = walk.pop(0); items = os.listdir(folder) # items = folders + files
        for i in items: i=os.path.join(folder,i); (walk if os.path.isdir(i) else allFiles).append(i)
    return allFiles

def listFiles3(root): # walk (takes ~1.5x as long)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[folder.replace("\\","/")+"/"+file] # folder+"\\"+file still ~1.5x
    return allFiles

def listFiles4(root): # walk/join (takes ~1.6x as long) (and uses '\\' instead)
    allFiles = []
    for folder, folders, files in os.walk(root):
        for file in files: allFiles+=[os.path.join(folder,file)]
    return allFiles


for i in range(100): files = listFiles1("src") # warm up

start = time.time()
for i in range(100): files = listFiles1("src") # listdir
print("Time taken: %.2fs"%(time.time()-start)) # 0.28s

start = time.time()
for i in range(100): files = listFiles2("src") # listdir and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.38s

start = time.time()
for i in range(100): files = listFiles3("src") # walk
print("Time taken: %.2fs"%(time.time()-start)) # 0.42s

start = time.time()
for i in range(100): files = listFiles4("src") # walk and join
print("Time taken: %.2fs"%(time.time()-start)) # 0.47s

So as you can see for yourself, the listdir version is much more efficient. (and that join is slow)

Peter Mortensen · Accepted Answer · 2023-06-22 20:57:21Z

4

Using any supported Python version (3.4+), you should use pathlib.rglob to recursively list the contents of the current directory and all subdirectories:

from pathlib import Path


def generate_all_files(root: Path, only_files: bool = True):
    for p in root.rglob("*"):
        if only_files and not p.is_file():
            continue
        yield p


for p in generate_all_files(Path("."), only_files=False):
    print(p)

If you want something copy-pasteable:

Example

Folder structure:

$ tree . -a
.
├── a.txt
├── bar
├── b.py
├── collect.py
├── empty
├── foo
│   └── bar.bz.gz2
├── .hidden
│   └── secrect-file
└── martin
    └── thoma
        └── cv.pdf

gives:

$ python collect.py
bar
empty
.hidden
collect.py
a.txt
b.py
martin
foo
.hidden/secrect-file
martin/thoma
martin/thoma/cv.pdf
foo/bar.bz.gz2

edited Jun 22, 2023 at 20:57

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Mar 19, 2022 at 8:20

Martin Thoma

138k174 gold badges687 silver badges1.1k bronze badges

9 Comments

Peter Mortensen Over a year ago

Where was this tested? For example, the name of the executable is python3 in later versions of Ubuntu.

Martin Thoma Over a year ago

I have no clue what you refer to.

Peter Mortensen Over a year ago

What operating system was this tested on? For instance, this will not work on some versions of Ubuntu.

Martin Thoma Over a year ago

I use Ubuntu 20.04 and sometimes Mac. Why do you think this would not work on Ubuntu?

Peter Mortensen Over a year ago

Because they removed the executable "python" in some versions of Ubuntu (with only the executable "python3" being available (by default)). E.g., from this: "From the launch of Ubuntu 22.04, you will only get Python 3.8 installed, and they are no longer shipping with Python2". Though I think it started in Ubuntu 18.04 already.

|

Peter Mortensen · Accepted Answer · 2023-04-13 07:26:00Z

1

And this is how you list it in case you want to list the files on SharePoint. Your path will probably start after the "\teams\" part.

import os

root = r"\\mycompany.sharepoint.com@SSL\DavWWWRoot\teams\MyFolder\Policies and Procedures\Deal Docs\My Deals"
list = [os.path.join(path, name) for path, subdirs, files in os.walk(root) for name in files]
print(list)

edited Apr 13, 2023 at 7:26

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Nov 5, 2021 at 5:03

Chadee Fouad

3,0192 gold badges30 silver badges34 bronze badges

2 Comments

Peter Mortensen Over a year ago

What is special about SharePoint? Can you elaborate?

Chadee Fouad Over a year ago

Most companies use SharePoint to store files.

Peter Mortensen · Accepted Answer · 2023-04-13 07:20:51Z

0

It's just an addition. With this, you can get the data into CSV format:

import sys, os

try:
    import pandas as pd
except:
    os.system("pip3 install pandas")

root = "/home/kiran/Downloads/MainFolder" # It may have many subfolders and files inside
lst = []
from fnmatch import fnmatch
pattern = "*.csv"      # I want to get only csv files
pattern = "*.*"        # Note: Use this pattern to get all types of files and folders
for path, subdirs, files in os.walk(root):
    for name in files:
        if fnmatch(name, pattern):
            lst.append((os.path.join(path, name)))
df = pd.DataFrame({"filePaths":lst})
df.to_csv("filepaths.csv")

edited Apr 13, 2023 at 7:20

Peter Mortensen

31.4k22 gold badges110 silver badges134 bronze badges

answered Feb 1, 2021 at 8:49

kiran beethoju

1681 silver badge5 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2023-04-13 07:29:20Z

A pretty simple solution would be to run a couple of sub process calls to export the files into CSV format:

import subprocess

# Global variables for directory being mapped

location = '.' # Enter the path here.
pattern = '*.py' # Use this if you want to only return certain filetypes
rootDir = location.rpartition('/')[-1]
outputFile = rootDir + '_directory_contents.csv'

# Find the requested data and export to CSV, specifying a pattern if needed.
find_cmd = 'find ' + location + ' -name ' + pattern +  ' -fprintf ' + outputFile + '  "%Y%M,%n,%u,%g,%s,%A+,%P\n"'
subprocess.call(find_cmd, shell=True)

That command produces comma-separated values that can be easily analyzed in Excel.

f-rwxrwxrwx,1,cathy,cathy,2642,2021-06-01+00:22:00.2970880000,content-audit.py

The resulting CSV file doesn't have a header row, but you can use a second command to add them.

# Add headers to the CSV
headers_cmd = 'sed -i.bak 1i"Permissions,Links,Owner,Group,Size,ModifiedTime,FilePath" ' + outputFile
subprocess.call(headers_cmd, shell=True)

Depending on how much data you get back, you can massage it further using Pandas. Here are some things I found useful, especially if you're dealing with many levels of directories to look through.

Add these to your imports:

import numpy as np
import pandas as pd

Then add this to your code:

# Create DataFrame from the CSV file created above.
df = pd.read_csv(outputFile)

# Format columns
# Get the filename and file extension from the filepath
df['FileName'] = df['FilePath'].str.rsplit("/", 1).str[-1]
df['FileExt'] = df['FileName'].str.rsplit('.', 1).str[1]

# Get the full path to the files. If the path doesn't include a "/" it's the root directory
df['FullPath'] = df["FilePath"].str.rsplit("/", 1).str[0]
df['FullPath'] = np.where(df['FullPath'].str.contains("/"), df['FullPath'], rootDir)

# Split the path into columns for the parent directory and its children
df['ParentDir'] = df['FullPath'].str.split("/", 1).str[0]
df['SubDirs'] = df['FullPath'].str.split("/", 1).str[1]
# Account for NaN returns, indicates the path is the root directory
df['SubDirs'] = np.where(df.SubDirs.str.contains('NaN'), '', df.SubDirs)

# Determine if the item is a directory or file.
df['Type'] = np.where(df['Permissions'].str.startswith('d'), 'Dir', 'File')

# Split the time stamp into date and time columns
df[['ModifiedDate', 'Time']] = df.ModifiedTime.str.rsplit('+', 1, expand=True)
df['Time'] = df['Time'].str.split('.').str[0]

# Show only files, output includes paths so you don't necessarily need to display the individual directories.
df = df[df['Type'].str.contains('File')]

# Set columns to show and their order.
df = df[['FileName', 'ParentDir', 'SubDirs', 'FullPath', 'DocType', 'ModifiedDate', 'Time', 'Size']]

filesize = [] # Create an empty list to store file sizes to convert them to something more readable.

# Go through the items and convert the filesize from bytes to something more readable.
for items in df['Size'].items():
    filesize.append(convert_bytes(items[1]))
    df['Size'] = filesize

# Send the data to an Excel workbook with sheets by parent directory
with pd.ExcelWriter("scripts_directory_contents.xlsx") as writer:
    for directory, data in df.groupby('ParentDir'):
    data.to_excel(writer, sheet_name = directory, index=False)


# To convert sizes to be more human readable
def convert_bytes(size):
    for x in ['b', 'K', 'M', 'G', 'T']:
        if size < 1024:
            return "%3.1f %s" % (size, x)
        size /= 1024

    return size

Collectives™ on Stack Overflow

Python list directory, subdirectory, and files

11 Answers 11

4 Comments

3 Comments

Comments

Here is a one-liner:

Extending this example:

3 Comments

1 Comment

Comments

Comments

Example

9 Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

4 Comments

3 Comments

Comments

Here is a one-liner:

Extending this example:

3 Comments

1 Comment

Comments

Comments

Example

9 Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related