0

I have 20 csv files with the same basename and a number from 100 to 2000 with an increment of 100 between files, such that samp_100.csv, samp_200.csv, samp_300.csv, ..., samp_1900.csv, samp_2000.csv.

I am trying to read these files into python. I am trying the following.

T = np.arange(100,2100,100)
for i in T: 
    df = pd.read_csv("samp_{i}.csv".format(i=i))

Although I do not get an error, the files aren't read in the correct order from 100 to 2000. When I use df.head, I do not see the first lines of the file samp_100.csv. Also the files are concatenated into a single file called df. Is there an equivalent way to achieve this but instead have 20 separate dataframes with the names df_100, df_200, ..., df_1900, df_2000?

9
  • You could put them all in a list e.g. df_list.append(pd.read_csv("samp_{i}.csv".format(i=i))) Commented Sep 25, 2022 at 1:05
  • I think the issue is when I read the files. The order seems to have changed. I cannot manipulate the dataframe if my 20 files are concatenated in ascending order from 100 to 2000. Commented Sep 25, 2022 at 1:10
  • 1
    What you're seeing is the data from samp_2000.csv because you're overwriting df in every pass of the loop. The files will be read in order (just try changing df = pd.read_csv(...) to print(...) and you'll see). Commented Sep 25, 2022 at 1:17
  • Oh I see. Thank you for the explanation. So to save all of the files, should I put them in a list as you recommended inside of the loop? Would you mind explaining me how to do it? Thanks Commented Sep 25, 2022 at 1:21
  • That would probably be the best solution unless you can completely process them in the loop. Commented Sep 25, 2022 at 1:22

1 Answer 1

2

You need pandas.concat.

Try this :

import numpy as np
import pandas as pd

T = np.arange(100,2100,100)

list_of_df = []
for i in T:
    temp_df = pd.read_csv(f"samp_{i}.csv")
    list_of_df.append(temp_df)
    
df = pd.concat(list_of_df, axis=0, ignore_index=True)

If you need to add a column with the name of the .csv, include the line below after calling pandas.read_csv inside the loop.

temp_df.insert(0, "filename", f"samp_{i}")
Sign up to request clarification or add additional context in comments.

6 Comments

Hello, thank you. When I run this all my dataframe values become NaN, but the dimensions are correct.
I run the code on my machine and I did'nt get NaN values. Can you elaborate more ?
Yes! I wish I could provide a screenshot. When I do df.shape I get 500000 rows × 203 columns. Each one of my files has 25000 rows x 13 columns. So my dimensions should be 500000 rows x 13 rows if concatenated by rows. When I print df, all my cell values become NaN. The column names, however, have numerical value. Please let me know if I can clarify anything else. Thank you!!
Can you show with a screenshot (in your post) how look like one of your .csv files ? Does they have the same columns names ?
You did well by adding the argument header=None. Happy coding!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.