0

I would like to merge multiple dataframe. I got below example work but the column name a little bit confuse:

import pandas as pd

def mergedf(dfs, countfiles,oncolumn,i=0):
    if i == (countfiles - 1): # it gets to the second to last and merges it with the last
        return dfs[i]

    dfm = dfs[i].merge(mergedf(dfs, countfiles,oncolumn,i=i+1), on=oncolumn)
    return dfm

df1 = pd.DataFrame([["x", 1],["y", 2]], columns=["A", "B"])
df2 = pd.DataFrame([["x", 3],["y", 4]], columns=["A", "B"])
df3 = pd.DataFrame([["x", 5],["y", 6]], columns=["A", "B"])
df4 = pd.DataFrame([["x", 7],["y", 8]], columns=["A", "B"])
df5 = pd.DataFrame([["x", 9],["y", 10]], columns=["A", "B"])

print(df1)

dfs = [df1,df2,df3,df4,df5]
df = mergedf(dfs, len(dfs),'A')

print(df)

Current output:

   A  B
0  x  1
1  y  2

   A  B  B_x  B_y  B_x  B_y
0  x  1    3    5    7    9
1  y  2    4    6    8   10

The colum name is (A,B,B_x,B_y,B_x,B_y),some column name is repeated. I would like the result column name as (A,B0,B1,B2,B3,B4),just as below:

   A  B0  B1   B2   B3   B4
0  x  1    3    5    7    9
1  y  2    4    6    8   10
2
  • Why not change the column names before merging? Commented Feb 21, 2022 at 4:03
  • For abstraction purpose maybe.. Commented Feb 21, 2022 at 4:04

1 Answer 1

2

You can try something like this:

def mergedf(dfs, countfiles,oncolumn): # don't need i=0 for this
    dfsi = [df.set_index(oncolumn) for df in dfs]
    df_out = pd.concat(dfsi, keys=range(countfiles), axis=1)
    df_out.columns = [f'{b}{a}' for a, b in df_out.columns]
    return df_out

df = mergedf(dfs,  len(dfs), 'A').reset_index()

Output:

   A  B0  B1  B2  B3  B4
0  x   1   3   5   7   9
1  y   2   4   6   8  10

Here, we are using the keys parameter of pd.concat to create the incremented number of times for B. pd.concat as most pandas methods, will align dataframes along the index, called intrinsic data alignment, hence we are moving the 'joining' column into the index and reset_index afterwards.


Option 2: You can rename and use join:

dfsi = [df.rename(columns={'B':f'B{i}'}).set_index('A') for i, df in enumerate(dfs)]
df_out = dfsi[0].join(dfsi[1:]).reset_index()

Output:

   A  B0  B1  B2  B3  B4
0  x   1   3   5   7   9
1  y   2   4   6   8  10
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.