merge multiple csv file using python pandas

Question

I am new to python , i am looking to merge multiple csv files .I have two files as follows

CSV1:
startp endp slack
S1 E1 -0.15
S4 E2 -10
S3 E3 -3.2

CSV2:
startp endp slack
S1 E1 -0.12
S2 E2 -4
S3 E3 -1.2

Merged csv : i want like this
startp endp slack_csv1 slack_csv2
S1 E1 -0.15 -0.12
S4 E2 -10 
S2 E2        -4
S3 E3 -3.2 -1.2

I wrote code like this 
    for file_name in all_csv"
        df=pd.read_csv(file_name)
        if i==0"
      df_t = df
      i=1
df_t=pd.merge(df_t,df)
print("after merge", df_t,df)

output for df_t is empty after second merge. If i try to merge on=endp , i get an error. Please help to know how to do this.

Stuart · Accepted Answer · 2020-03-14 05:42:53Z

Note: I'm assuming your CSVs have commas where you put spaces.

I'm not sure what the " was for after the for statement, but I formatted it to what I think you meant to write.

You were close to having it. You just have to specify some more parameters for the pd.merge() function.

You may want to change the suffixes parameter to accept variables that indicate which file they are from (see below for an extended answer that addresses this).

all_csv = ["csv1.csv", "csv2.csv"]
i = 0
for file_name in all_csv:
    df = pd.read_csv(file_name)
    if i == 0:
        df_t = df
        i = 1

df_t = pd.merge(df_t, df, on=['startp', 'endp'], how='outer', suffixes=('_1', '_2'))
print("after merge", df_t)

outputs:

  startp endp  slack_1  slack_2
0     S1   E1       -0.15       -0.12
1     S4   E2      -10.00         NaN
2     S3   E3       -3.20       -1.20
3     S2   E2         NaN       -4.00

Alternate solution but handles more than 2 files

Here I use pd.DataFrame.merge() instead of pd.merge(), but they accomplish the same task. I'm renaming the column before combining it, which means more than 2 files can be combined. This is just one way. You can reformat your prior code to handle more than 2 files as well.

df_combined = None
for csv_file in all_csv:
    df = pd.read_csv(csv_file)
    df = df.rename(columns={'slack': 'slack_' + csv_file})
    if df_combined is None:
        df_combined = df.copy()
    else:
        df_combined = df_combined.merge(df, how='outer', on=['startp', 'endp'])

print(df_combined)

Collectives™ on Stack Overflow

merge multiple csv file using python pandas

1 Answer 1

Alternate solution but handles more than 2 files

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Alternate solution but handles more than 2 files

Comments

Your Answer

Sign up or log in

Post as a guest

Related