How to import multiple csv files and concatenate into one DataFrame using pandas

Question

I have problem No objects to concatenate. I can not import .csv files from main and its subdirectories to concatenate them into one DataFrame. I am using pandas. Old answers did not help me so please do not mark as duplicated.

Folder structure is like that

main/*.csv
main/name1/name1/*.csv
main/name1/name2/*.csv
main/name2/name1/*.csv
main/name3/*.csv

import pandas as pd
import os
import glob

folder_selected = 'C:/Users/jacob/Documents/csv_files'

not works

frame = pd.concat(map(pd.read_csv, glob.iglob(os.path.join(folder_selected, "/*.csv"))))

not works

csv_paths = glob.glob('*.csv')
dfs = [pd.read_csv(folder_selected) for folder_selected in csv_paths]
df = pd.concat(dfs)

not works

            all_files = []
            
            all_files = glob.glob (folder_selected + "/*.csv")
            
            file_path = []
            for file in all_files:
                df = pd.read_csv(file, index_col=None, header=0)
                file_path.append(df)
                    
        frame = pd.concat(file_path, axis=0, ignore_index=False)

O_o · Accepted Answer · 2020-12-26 08:43:01Z

7

You need to search the subdirectories recursively.

folder = 'C:/Users/jacob/Documents/csv_files'
path = folder+"/**/*.csv"

Using glob.iglob

df = pd.concat(map(pd.read_csv, glob.iglob(path, recursive=True)))

Using glob.glob

csv_paths = glob.glob(path, recursive=True)
dfs = [pd.read_csv(csv_path) for csv_path in csv_paths]
df = pd.concat(dfs)

Using os.walk

file_paths = []
for base, dirs, files in os.walk(folder):
    for file in fnmatch.filter(files, '*.csv'):
        file_paths.append(os.path.join(base, file))
df = pd.concat([pd.read_csv(file) for file in file_paths])

Using pathlib

from pathlib import Path
files = Path(folder).rglob('*.csv')
df = pd.concat(map(pd.read_csv, files))

edited Dec 26, 2020 at 8:43

answered Dec 26, 2020 at 8:19

O_o

9133 gold badges13 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Alex Rodrigues Over a year ago

Thank you so much. You made my day. I chose third method and now it is working like a charm.

O_o Over a year ago

Happy to help :) @AlexRodrigues. If it helped you can you upvote the answer?

Alex Rodrigues Over a year ago

Yes, sorry I just forgot. Thank you again.

mgc · Accepted Answer · 2020-12-26 08:05:22Z

2

Check Dask Library as following, which reads many files to one df

>>> import dask.dataframe as dd
>>> df = dd.read_csv('data*.csv')

Read their docs https://examples.dask.org/dataframes/01-data-access.html#Read-CSV-files

edited Dec 26, 2020 at 8:05

mgc

5,4531 gold badge28 silver badges39 bronze badges

answered Dec 26, 2020 at 7:54

Ahmad H. Ibrahim

1,0899 silver badges14 bronze badges

1 Comment

David Erickson Over a year ago

This is a good answer. However, all of the files would need to be in the same directory. Also, I would add that to transform to a pandas dataframe, you would need to add .compute() to the end, e.g. df = dd.read_csv('data*.csv').compute()

Prayson W. Daniel · Accepted Answer · 2021-10-16 17:53:34Z

2

Python’s pathlib is a tool for such tasks

from pathlib import Path

FOLDER_SELECTED = 'C:/Users/jacob/Documents/csv_files'

path = Path(FOLDER_SELECTED) / Path("main")

# grab all csvs in main and subfolders
df = pd.concat(pd.read_csv(f.name) for f in path.rglob("*.csv"))

Note:

If the CSV need preprocing, you can create a read_csv function to deal with issues and place it in place of pd.read_csv

edited Oct 16, 2021 at 17:53

answered Dec 26, 2020 at 8:34

Prayson W. Daniel

15.8k6 gold badges57 silver badges62 bronze badges

Collectives™ on Stack Overflow

How to import multiple csv files and concatenate into one DataFrame using pandas

3 Answers 3

3 Comments

1 Comment

Note:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Note:

Comments

Your Answer

Sign up or log in

Post as a guest

Related