2

I am new to python , i am looking to merge multiple csv files .I have two files as follows

CSV1:
startp endp slack
S1 E1 -0.15
S4 E2 -10
S3 E3 -3.2

CSV2:
startp endp slack
S1 E1 -0.12
S2 E2 -4
S3 E3 -1.2

Merged csv : i want like this
startp endp slack_csv1 slack_csv2
S1 E1 -0.15 -0.12
S4 E2 -10 
S2 E2        -4
S3 E3 -3.2 -1.2

I wrote code like this 
    for file_name in all_csv"
        df=pd.read_csv(file_name)
        if i==0"
      df_t = df
      i=1
df_t=pd.merge(df_t,df)
print("after merge", df_t,df)

output for df_t is empty after second merge. If i try to merge on=endp , i get an error. Please help to know how to do this.

1 Answer 1

1

Note: I'm assuming your CSVs have commas where you put spaces.

I'm not sure what the " was for after the for statement, but I formatted it to what I think you meant to write.

You were close to having it. You just have to specify some more parameters for the pd.merge() function.

You may want to change the suffixes parameter to accept variables that indicate which file they are from (see below for an extended answer that addresses this).

all_csv = ["csv1.csv", "csv2.csv"]
i = 0
for file_name in all_csv:
    df = pd.read_csv(file_name)
    if i == 0:
        df_t = df
        i = 1

df_t = pd.merge(df_t, df, on=['startp', 'endp'], how='outer', suffixes=('_1', '_2'))
print("after merge", df_t)

outputs:

  startp endp  slack_1  slack_2
0     S1   E1       -0.15       -0.12
1     S4   E2      -10.00         NaN
2     S3   E3       -3.20       -1.20
3     S2   E2         NaN       -4.00

Alternate solution but handles more than 2 files

Here I use pd.DataFrame.merge() instead of pd.merge(), but they accomplish the same task. I'm renaming the column before combining it, which means more than 2 files can be combined. This is just one way. You can reformat your prior code to handle more than 2 files as well.

df_combined = None
for csv_file in all_csv:
    df = pd.read_csv(csv_file)
    df = df.rename(columns={'slack': 'slack_' + csv_file})
    if df_combined is None:
        df_combined = df.copy()
    else:
        df_combined = df_combined.merge(df, how='outer', on=['startp', 'endp'])

print(df_combined)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.