Loading multiple csv files of a folder into one dataframe

Question

i have multiple csv files saved in one folder with the same column layout and want to load it into python as a dataframe in pandas.

The question is really simliar to this thread.

I am using the following code:

import glob
import pandas as pd
salesdata = pd.DataFrame()
for f in glob.glob("TransactionData\Promorelevant\*.csv"):
    appenddata = pd.read_csv(f, header=None, sep=";")
    salesdata = salesdata.append(appenddata,ignore_index=True)

Is there a better solution for it with another package?

This is taking to much time.

Thanks

jezrael · Accepted Answer · 2018-09-12 07:11:21Z

17

I suggest use list comprehension with concat:

import glob
import pandas as pd

files = glob.glob("TransactionData\Promorelevant*.csv")
dfs = [pd.read_csv(f, header=None, sep=";") for f in files]

salesdata = pd.concat(dfs,ignore_index=True)

answered Sep 12, 2018 at 7:11

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Shiva Over a year ago

pd.read_csv can load data using file path, any specific reason for using glob?

jezrael Over a year ago

@Shiva - Yes, glob return all filepaths, so it is necessary.

PascalVKooten Over a year ago

This is barely any different from the question? Though I guess concat can be faster than append (or does append use concat behind the scenes?). It could be better optimized as it is one operation I guess.

Muhammad Haseeb · Accepted Answer · 2018-09-12 07:16:36Z

6

With a help from link to actual answer

This seems to be the best one liner:

import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "*.csv"))))

answered Sep 12, 2018 at 7:16

Muhammad Haseeb

6446 silver badges20 bronze badges

1 Comment

PV8 Over a year ago

could you specify your solotion?

gelonida · Accepted Answer · 2019-11-15 13:35:42Z

2

Maybe using bash will be faster:

head -n 1 "TransactionData/Promorelevant/0.csv" > merged.csv
tail -q -n +2 TransactionData/Promorelevant*.csv >> merged.csv

Or if using from within a jupyter notebook

!head -n 1 "TransactionData/Promorelevant/0.csv" > merged.csv
!tail -q -n +2 "TransactionData/Promorelevant*.csv" >> merged.csv

The idea being that you won't need to parse anything.

The first command copies the header of one of the files. You can skip this line if you don't have a header. Tail skips the headers for all the files and adds them to the csv.

Appending in Python is probably more expensive.

Of course, make sure your parse is still valid using pandas.

pd.read_csv("merged.csv")

Curious to your benchmark.

edited Nov 15, 2019 at 13:35

gelonida

5,6602 gold badges28 silver badges47 bronze badges

answered Sep 12, 2018 at 7:17

PascalVKooten

21.6k18 gold badges115 silver badges169 bronze badges

6 Comments

PascalVKooten Over a year ago

@PV8 What do you not understand? First line copies the header of one of the files to merged.csv, the second line appends all csvs but omitting their headers. Since it does not have to parse anything it will be lightning fast.

PV8 Over a year ago

assuming my files in the folder are named: 0.csv, 1.csvand so on, and the folder path is still: TransactionData\Promorelevant, what do I have to write to use your code?

PascalVKooten Over a year ago

@PV8 Updated the example with the folder, does it work?

PV8 Over a year ago

the 1 is identified as invalid syntax, I hav eto run this in jupyter notebook right?

PascalVKooten Over a year ago

@PV8 Updated to add quotes, the better slashes, and you have to have the ! in front if using jupyter. The server the notebook is running on is linux based right?

|

PV8 · Accepted Answer · 2018-09-18 08:13:14Z

0

i checked all this approaches except the bash one with the time function (only one run, and also note that the files are on a shared drive).

Here are the results:

My approach: 1220.49

List comphrension+concat: 1135.53

concat+map+join: 1116.31

I will go for list comphrension+concat which will save me some minutes and i feel quite familiar with.

Thanks for your ideas.

answered Sep 18, 2018 at 8:13

PV8

6,3669 gold badges54 silver badges113 bronze badges

Collectives™ on Stack Overflow

Loading multiple csv files of a folder into one dataframe

4 Answers 4

3 Comments

1 Comment

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related