Python code to list files in each sub directory in Azure Databricks

Question

I am trying to list the files, their column count, column names from each sub directory present inside a directory,

Directory : dbfs:/mnt/adls/ib/har/
Sub Directory    2021-01-01
File                A.csv
File                B.csv
Sub Directory    2021-01-02
File                A1.csv
File                B1.csv

With the below code I am getting the error 'PosixPath' object is not iterable in the second for loop. Could someone help me out please?

files = dbutils.fs.ls(f"dbfs:/mnt/adls/ib/har/")
for fi in files: 
  il=fi.path
  print(il)
  ill=Path(il)
  for fii in ill:
    if(".csv" in fii.path):
      df2 = spark.read.option("header","true").option("sep", ";").option("escape", "\"").csv(f"{fii.path}")
      m = df2.columns
      l = len(df2.columns)
      print(f"{fii.path} has, {l} columns, {m}")
      cols[fii.path] = l

maxkey = max(cols, key=cols.get)
maxvalue = cols.get(maxkey)

Karthikeyan Rasipalay Durairaj · Accepted Answer · 2021-09-28 14:17:05Z

3

please try with below code . Updated with complete logic

def get_dir_content(ls_path):
    for dir_path in dbutils.fs.ls(ls_path):
        if dir_path.isFile():
            yield dir_path.path
        elif dir_path.isDir() and ls_path != dir_path.path:
            yield from get_dir_content(dir_path.path)
    
my_list =list(get_dir_content('mnt/acct_vw'))
matchers = ['.csv']
matching = [s for s in my_list if any(xs in s for xs in matchers)]
print(matching)

edited Sep 28, 2021 at 14:17

answered Sep 28, 2021 at 1:53

Karthikeyan Rasipalay Durairaj

2,33922 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ram Over a year ago

Hi Karthikeyan, This displays only the date folders, but not the csv files present inside the date folders

Karthikeyan Rasipalay Durairaj Over a year ago

Hi Ram, I have updated the answer with full logic . @Ram

Collectives™ on Stack Overflow

Python code to list files in each sub directory in Azure Databricks

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related