python read csv files with same basename and save as different dataframes

Question

I have 20 csv files with the same basename and a number from 100 to 2000 with an increment of 100 between files, such that samp_100.csv, samp_200.csv, samp_300.csv, ..., samp_1900.csv, samp_2000.csv.

I am trying to read these files into python. I am trying the following.

T = np.arange(100,2100,100)
for i in T: 
    df = pd.read_csv("samp_{i}.csv".format(i=i))

Although I do not get an error, the files aren't read in the correct order from 100 to 2000. When I use df.head, I do not see the first lines of the file samp_100.csv. Also the files are concatenated into a single file called df. Is there an equivalent way to achieve this but instead have 20 separate dataframes with the names df_100, df_200, ..., df_1900, df_2000?

You could put them all in a list e.g. df_list.append(pd.read_csv("samp_{i}.csv".format(i=i))) — Nick
– Nick, Commented Sep 25, 2022 at 1:05
I think the issue is when I read the files. The order seems to have changed. I cannot manipulate the dataframe if my 20 files are concatenated in ascending order from 100 to 2000. — user19619903
– user19619903, Commented Sep 25, 2022 at 1:10
What you're seeing is the data from samp_2000.csv because you're overwriting df in every pass of the loop. The files will be read in order (just try changing df = pd.read_csv(...) to print(...) and you'll see). — Nick
– Nick, Commented Sep 25, 2022 at 1:17
Oh I see. Thank you for the explanation. So to save all of the files, should I put them in a list as you recommended inside of the loop? Would you mind explaining me how to do it? Thanks — user19619903
– user19619903, Commented Sep 25, 2022 at 1:21
That would probably be the best solution unless you can completely process them in the loop. — Nick
– Nick, Commented Sep 25, 2022 at 1:22

Timeless · Accepted Answer · 2022-09-25 01:46:54Z

2

You need pandas.concat.

Try this :

import numpy as np
import pandas as pd

T = np.arange(100,2100,100)

list_of_df = []
for i in T:
    temp_df = pd.read_csv(f"samp_{i}.csv")
    list_of_df.append(temp_df)
    
df = pd.concat(list_of_df, axis=0, ignore_index=True)

If you need to add a column with the name of the .csv, include the line below after calling pandas.read_csv inside the loop.

temp_df.insert(0, "filename", f"samp_{i}")

edited Sep 25, 2022 at 1:46

answered Sep 25, 2022 at 1:41

Timeless

38.3k6 gold badges33 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user19619903 Over a year ago

Hello, thank you. When I run this all my dataframe values become NaN, but the dimensions are correct.

Timeless Over a year ago

I run the code on my machine and I did'nt get NaN values. Can you elaborate more ?

user19619903 Over a year ago

Yes! I wish I could provide a screenshot. When I do df.shape I get 500000 rows × 203 columns. Each one of my files has 25000 rows x 13 columns. So my dimensions should be 500000 rows x 13 rows if concatenated by rows. When I print df, all my cell values become NaN. The column names, however, have numerical value. Please let me know if I can clarify anything else. Thank you!!

Timeless Over a year ago

Can you show with a screenshot (in your post) how look like one of your .csv files ? Does they have the same columns names ?

Timeless Over a year ago

You did well by adding the argument header=None. Happy coding!

|

Collectives™ on Stack Overflow

python read csv files with same basename and save as different dataframes

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related