0

I am new to Python and am looking for a simple solution.

I have several .csv files with the same structure (number of columns and lines) in one folder. The path is: C:\temp

Now I want to read all these .csv files into a new dataframe, which I want to export later as a new .csv file.

up to now i have read each .csv file by hand and saved it into a pandas dataframe.

Here is an example:

df1 = pd.read_csv(r "C:\temp\df1.csv", header= None)
df2 = pd.read_csv(r "C:\temp\df2.csv", header= None)

df1

0 id Feature
1 1 12
2 2 13
3 3 14
4 4 15
5 5 16
6 7 17
7 8 15
8 9 12
9 10 13
10 11 23

Then I used .append to merge the dataframes.

df_new = df1.append(df2)

0   id  Feature
1   1   12
2   2   13
3   3   14
4   4   15
5   5   16
6   7   17
7   8   15
8   9   12
9   10  13
10  11  23
0   id  Feature
1   1   14
2   2   9
3   3   3
4   4   8
5   5   9
6   7   1
7   8   32
8   9   7
9   10  3
10  11  12

df_new.to_csv('df_new.csv', index=False)

Unfortunately this version always has the header with me, but I don't need it. So I deleted it afterwards by hand.

Isn't there a faster version? I'm thinking of a for loop that opens all existing .csv files in the path and reads them line by line into a new dataframe and at the end of the loop makes a .csv file out of it? Unfortunately I have no experience with loops.

I appreciate your help.

2 Answers 2

1

You could do something like this:

frames = []
for i in range(0,10):
    frames.append(pd.read_csv(r "C:\temp\df%d.csv" % i))

df = pd.concat(frames)

Which should concatenate all csv files into one dataframe and which works if all filenames are structured like df1d.csv, df2d.csv etc. and you know the amount of files. If you just want to loop over all files in the folder, you could do something like this:

import glob

path = "C:\temp\*.csv"

frames = []
for fname in glob.glob(path):
    frames.append(pd.read_csv(fname))

df = pd.concat(frames)

The extended version, without a loop would be:

df1 = pd.read_csv(r "C:\temp\df1.csv")
df2 = pd.read_csv(r "C:\temp\df2.csv")
df3 = pd.read_csv(r "C:\temp\df3.csv")
frames = [df1, df2, df3]
df = pd.concat(frames)

Afterwards you could save the dataframe back to a csv file:

df.to_csv(file_name)
Sign up to request clarification or add additional context in comments.

1 Comment

These solutions are exactly what I was looking for! Thanks a lot! Amazing Community!
0
In [1]: import pandas as pd

In [2]: from io import StringIO

In [3]: df = pd.read_csv(StringIO("""0 id Feature
   ...: 1 1 12
   ...: 2 2 13
   ...: 3 3 14
   ...: 4 4 15
   ...: 5 5 16
   ...: 6 7 17
   ...: 7 8 15
   ...: 8 9 12
   ...: 9 10 13
   ...: 10 11 23"""), sep=' ')

In [4]: df1 = pd.read_csv(StringIO("""0   id  Feature
   ...: 1   1   14
   ...: 2   2   9
   ...: 3   3   3
   ...: 4   4   8
   ...: 5   5   9
   ...: 6   7   1
   ...: 7   8   32
   ...: 8   9   7
   ...: 9   10   3
   ...: 10   11   12"""), sep='   ')

In [10]: pd.concat([df, df1])
Out[10]: 
    0  id  Feature
0   1   1       12
1   2   2       13
2   3   3       14
3   4   4       15
4   5   5       16
5   6   7       17
6   7   8       15
7   8   9       12
8   9  10       13
9  10  11       23
0   1   1       14
1   2   2        9
2   3   3        3
3   4   4        8
4   5   5        9
5   6   7        1
6   7   8       32
7   8   9        7
8   9  10        3
9  10  11       12

In [11]: %timeit pd.concat([df, df1])

188 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [14]: df.append(df1)
Out[14]: 
    0  id  Feature
0   1   1       12
1   2   2       13
2   3   3       14
3   4   4       15
4   5   5       16
5   6   7       17
6   7   8       15
7   8   9       12
8   9  10       13
9  10  11       23
0   1   1       14
1   2   2        9
2   3   3        3
3   4   4        8
4   5   5        9
5   6   7        1
6   7   8       32
7   8   9        7
8   9  10        3
9  10  11       12

In [15]: %timeit df.append(df1)
197 µs ± 4.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

With pandas version '1.1.3'

You can clearly check that pd.concat is faster than df.append(df1).

For working with loops, you can create a variable with the filenames and keep a list of dataframes from those files using for loop, something like this

filename = ['1.csv', '2.csv']

dfs = []

for name in filename:
    dfs.append(pd.read_csv(name))

new_df = pd.concat(dfs)

This is easy, efficient, cleaner and faster as well.

And then save the file to csv.

new_df.to_csv(out_filename)

1 Comment

These solutions are exactly what I was looking for! Thanks a lot! Amazing Community!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.