pandas - merge multiple dataframe column name

Question

I would like to merge multiple dataframe. I got below example work but the column name a little bit confuse:

import pandas as pd

def mergedf(dfs, countfiles,oncolumn,i=0):
    if i == (countfiles - 1): # it gets to the second to last and merges it with the last
        return dfs[i]

    dfm = dfs[i].merge(mergedf(dfs, countfiles,oncolumn,i=i+1), on=oncolumn)
    return dfm

df1 = pd.DataFrame([["x", 1],["y", 2]], columns=["A", "B"])
df2 = pd.DataFrame([["x", 3],["y", 4]], columns=["A", "B"])
df3 = pd.DataFrame([["x", 5],["y", 6]], columns=["A", "B"])
df4 = pd.DataFrame([["x", 7],["y", 8]], columns=["A", "B"])
df5 = pd.DataFrame([["x", 9],["y", 10]], columns=["A", "B"])

print(df1)

dfs = [df1,df2,df3,df4,df5]
df = mergedf(dfs, len(dfs),'A')

print(df)

Current output:

   A  B
0  x  1
1  y  2

   A  B  B_x  B_y  B_x  B_y
0  x  1    3    5    7    9
1  y  2    4    6    8   10

The colum name is (A,B,B_x,B_y,B_x,B_y),some column name is repeated. I would like the result column name as (A,B0,B1,B2,B3,B4),just as below:

   A  B0  B1   B2   B3   B4
0  x  1    3    5    7    9
1  y  2    4    6    8   10

Why not change the column names before merging?

Pab
– Pab

2022-02-21 04:03:45 +00:00
Commented Feb 21, 2022 at 4:03 — Pab
– Pab, Commented Feb 21, 2022 at 4:03
For abstraction purpose maybe..

Anurag Dhadse
– Anurag Dhadse

2022-02-21 04:04:11 +00:00
Commented Feb 21, 2022 at 4:04 — Anurag Dhadse
– Anurag Dhadse, Commented Feb 21, 2022 at 4:04

Scott Boston · Accepted Answer · 2022-02-21 04:19:42Z

You can try something like this:

def mergedf(dfs, countfiles,oncolumn): # don't need i=0 for this
    dfsi = [df.set_index(oncolumn) for df in dfs]
    df_out = pd.concat(dfsi, keys=range(countfiles), axis=1)
    df_out.columns = [f'{b}{a}' for a, b in df_out.columns]
    return df_out

df = mergedf(dfs,  len(dfs), 'A').reset_index()

Output:

   A  B0  B1  B2  B3  B4
0  x   1   3   5   7   9
1  y   2   4   6   8  10

Here, we are using the keys parameter of pd.concat to create the incremented number of times for B. pd.concat as most pandas methods, will align dataframes along the index, called intrinsic data alignment, hence we are moving the 'joining' column into the index and reset_index afterwards.

Option 2: You can rename and use join:

dfsi = [df.rename(columns={'B':f'B{i}'}).set_index('A') for i, df in enumerate(dfs)]
df_out = dfsi[0].join(dfsi[1:]).reset_index()

Output:

   A  B0  B1  B2  B3  B4
0  x   1   3   5   7   9
1  y   2   4   6   8  10

Collectives™ on Stack Overflow

pandas - merge multiple dataframe column name

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related