Python Pandas - Combine CSVs and add filename

Question

I'm trying to combine CSV files in a folder to analyze them. Also, I want to append the filename of each as a column so I can figure out which data came from which file. I've looked at the similar questions and none of them have worked for me.

Here's the code I'm using. There are 24 CSV files in this folder and since combining CSV files later would be easy using cat so even a method to tell me how I could append the filename in each file would be perfect. Any help would be great.

import pandas as pd
import os
import glob
import csv
path=r'/home/videept/Downloads/A_DeviceMotion_data/A_DeviceMotion_data/dws_1/'
with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)

    for filename in glob.glob(os.path.join(path,"*.csv")):
        with open(filename, newline='') as f_input:
            csv_input = csv.reader(f_input)

            for row in csv_input:
                row.insert(0, filename)
                csv_output.writerow(row)

When I'm doing this the cell goes on an infinite loop and no new file is even created. I'm not sure how I can see the progress of what's going on so any idea on that would also be great. Thanks :)

I would add a print("Processing", filename, "...) just before the with open(filename, newline='')... line to be sure whether one file is blocking everything. If not enough, I would also add a trace every n rows with something like: for i,row in enumerate(csv_input): if (0 == i%n): print('.', end='') ... — Serge Ballesta
– Serge Ballesta, Commented Jul 4, 2019 at 7:45
use print() to see what you have in variables - ie. filename and row — furas
– furas, Commented Jul 4, 2019 at 7:46
are they single-column CSV files? and what version of python you are using? — mohd4482
– mohd4482, Commented Jul 4, 2019 at 7:47
Thanks Serge this helped me figure out where I was getting stuck. Appreicate it :) — Videept Kohli
– Videept Kohli, Commented Jul 4, 2019 at 8:25

mohd4482 · Accepted Answer · 2021-08-27 07:45:44Z

6

I would do it this way (provided your are using Python 3.4+):

import pandas as pd
from pathlib import Path

source_files = sorted(Path('path_to_source_directory').glob('*.csv'))

dataframes = []
for file in source_files:
    df = pd.read_csv(file) # additional arguments up to your needs
    df['source'] = file.name
    dataframes.append(df)

df_all = pd.concat(dataframes)

This way, every row has a column represents its source file for easy filtering and analysis.

edited Aug 27, 2021 at 7:45

answered Jul 4, 2019 at 8:12

mohd4482

1,95815 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Videept Kohli Over a year ago

Thank you so much this worked. Like it wasn't appending in the same csv file and that's what confused me. When I wrote the dataframe to another csv the column appeared. Thanks :)

mohd4482 Over a year ago

I'm glad that I could help you

Marjan Moderc · Accepted Answer · 2019-07-04 08:07:06Z

1

First, make sure that all the csv files have the same structure. Then make sure that you can read one csv file properly. Then you can do it iteratively:

import pandas as pd
import glob

df_all = pd.DataFrame()

for f in glob.glob("path/to/csv/files/prefix_*.csv"):

    df = pd.read_csv(f) # make sure to apply correct settings (sep, parse_dates, headers, missing_values)
    df["origin"] = f #add a column with a csv name
    df_all = df_all.append(df) #append new df to the "master" dataframe

df_all.to_csv("merged.csv")

UPDATE: If you are afraid all the data wouldn't fit in your memory, take a look at the Dask library.

edited Jul 4, 2019 at 8:07

answered Jul 4, 2019 at 7:58

Marjan Moderc

2,87928 silver badges45 bronze badges

2 Comments

Serge Ballesta Over a year ago

Beware, you are loading everything in memory. It might crash in case of too many and/or too big files...

Serge Ballesta Over a year ago

Or simply the csv module which allow to only process one row at a time, whatever the number and sizes of the files...

Collectives™ on Stack Overflow

Python Pandas - Combine CSVs and add filename

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related