0

I am trying to list the files, their column count, column names from each sub directory present inside a directory,

Directory : dbfs:/mnt/adls/ib/har/
Sub Directory    2021-01-01
File                A.csv
File                B.csv
Sub Directory    2021-01-02
File                A1.csv
File                B1.csv

With the below code I am getting the error 'PosixPath' object is not iterable in the second for loop. Could someone help me out please?

files = dbutils.fs.ls(f"dbfs:/mnt/adls/ib/har/")
for fi in files: 
  il=fi.path
  print(il)
  ill=Path(il)
  for fii in ill:
    if(".csv" in fii.path):
      df2 = spark.read.option("header","true").option("sep", ";").option("escape", "\"").csv(f"{fii.path}")
      m = df2.columns
      l = len(df2.columns)
      print(f"{fii.path} has, {l} columns, {m}")
      cols[fii.path] = l

maxkey = max(cols, key=cols.get)
maxvalue = cols.get(maxkey)

1 Answer 1

3

please try with below code . Updated with complete logic

def get_dir_content(ls_path):
    for dir_path in dbutils.fs.ls(ls_path):
        if dir_path.isFile():
            yield dir_path.path
        elif dir_path.isDir() and ls_path != dir_path.path:
            yield from get_dir_content(dir_path.path)
    
my_list =list(get_dir_content('mnt/acct_vw'))
matchers = ['.csv']
matching = [s for s in my_list if any(xs in s for xs in matchers)]
print(matching)
Sign up to request clarification or add additional context in comments.

2 Comments

Hi Karthikeyan, This displays only the date folders, but not the csv files present inside the date folders
Hi Ram, I have updated the answer with full logic . @Ram

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.