4

I am trying to merge/join multiple Dataframes and so far I have no luck. I've found merge method, but it works only with two Dataframes. I also found this SO answer suggesting to do something like that:

df1.merge(df2,on='name').merge(df3,on='name')

Unfortunatelly it will not work in my case, because I have 20+ number of dataframes.

My next idea was to use join. According to the reference when joining multiple dataframes I need to use list and only I can join on index column. So I changed indexes for all of the columns (ok, it can be done grammatically easily) and end up with something like this:

df.join([df1,df2,df3])

Unfortunately, also this approach failed, because other columns names are this same in all dataframes. I've decided to do the last thing, that is renaming all columns. But when I finally joined everything: df = pd.Dataframe() df.join([df1,df2,df3])

I've received empty dataframe. I have no more idea, how I can join them. Can someone suggest anything more?

EDIT1:

Sample input:

import pandas as pd

df1 = pd.DataFrame(np.array([
    ['a', 5, 19],
    ['b', 14, 16],
    ['c', 4, 9]]),
    columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
    ['a', 15, 49],
    ['b', 4, 36],
    ['c', 14, 9]]),
    columns=['name', 'attr1', 'attr2'])

df1 
  name attr1 attr2
0    a     5    19
1    b    14    16
2    c     4     9

df2
  name attr1 attr2
0    a    15    49
1    b     4    36
2    c    14     9

Expected output:

df
  name attr1_1 attr2_1 attr1_2 attr2_2
0    a     5    19      15      49
1    b    14    16      4       36
2    c     4     9      14      9

Indexes might be unordered between dataframes, but it is guaranteed, that they will exists.

2
  • 3
    If the column names are the same in all the DataFrames, then what do you want the output to actually look like? Commented Jul 19, 2016 at 19:46
  • @chrisaycock I've added sample input and expected output. I've based it on merge, where pandas automatically adds suffixes to names of other columns Commented Jul 19, 2016 at 19:54

3 Answers 3

8

use pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')

merged

enter image description here

Or

merged.reset_index()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

use reduce:

def my_merge(df1, df2):
    return df1.merge(df2,on='name')

final_df = reduce(my_merge, df_list)

considering df_list to be a list of your dataframes

Comments

-1

The solution of @piRSquared works for 20+ dataframes, see the following script for creating 20+ example dataframes:

N = 25
dflist = []

for d in range(N):
    df = pd.DataFrame(np.random.rand(3,2))
    df.columns = ['attr1', 'attr2']

    df['name'] = ['a', 'b', 'c']

    dflist.append(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.