4

I have problem No objects to concatenate. I can not import .csv files from main and its subdirectories to concatenate them into one DataFrame. I am using pandas. Old answers did not help me so please do not mark as duplicated.

Folder structure is like that

main/*.csv
main/name1/name1/*.csv
main/name1/name2/*.csv
main/name2/name1/*.csv
main/name3/*.csv
import pandas as pd
import os
import glob

folder_selected = 'C:/Users/jacob/Documents/csv_files'
  1. not works
frame = pd.concat(map(pd.read_csv, glob.iglob(os.path.join(folder_selected, "/*.csv"))))
  1. not works
csv_paths = glob.glob('*.csv')
dfs = [pd.read_csv(folder_selected) for folder_selected in csv_paths]
df = pd.concat(dfs)
  1. not works
            all_files = []
            
            all_files = glob.glob (folder_selected + "/*.csv")
            
            file_path = []
            for file in all_files:
                df = pd.read_csv(file, index_col=None, header=0)
                file_path.append(df)
                    
        frame = pd.concat(file_path, axis=0, ignore_index=False)

3 Answers 3

7

You need to search the subdirectories recursively.

folder = 'C:/Users/jacob/Documents/csv_files'
path = folder+"/**/*.csv"
  1. Using glob.iglob
df = pd.concat(map(pd.read_csv, glob.iglob(path, recursive=True)))
  1. Using glob.glob
csv_paths = glob.glob(path, recursive=True)
dfs = [pd.read_csv(csv_path) for csv_path in csv_paths]
df = pd.concat(dfs)
  1. Using os.walk
file_paths = []
for base, dirs, files in os.walk(folder):
    for file in fnmatch.filter(files, '*.csv'):
        file_paths.append(os.path.join(base, file))
df = pd.concat([pd.read_csv(file) for file in file_paths])
  1. Using pathlib
from pathlib import Path
files = Path(folder).rglob('*.csv')
df = pd.concat(map(pd.read_csv, files))
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you so much. You made my day. I chose third method and now it is working like a charm.
Happy to help :) @AlexRodrigues. If it helped you can you upvote the answer?
Yes, sorry I just forgot. Thank you again.
2

Check Dask Library as following, which reads many files to one df

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

Read their docs https://examples.dask.org/dataframes/01-data-access.html#Read-CSV-files

1 Comment

This is a good answer. However, all of the files would need to be in the same directory. Also, I would add that to transform to a pandas dataframe, you would need to add .compute() to the end, e.g. df = dd.read_csv('data*.csv').compute()
2

Python’s pathlib is a tool for such tasks

from pathlib import Path

FOLDER_SELECTED = 'C:/Users/jacob/Documents/csv_files'

path = Path(FOLDER_SELECTED) / Path("main")

# grab all csvs in main and subfolders
df = pd.concat(pd.read_csv(f.name) for f in path.rglob("*.csv"))

Note:

If the CSV need preprocing, you can create a read_csv function to deal with issues and place it in place of pd.read_csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.