I have multiple data frames that I would like to concatenate. Some of these do not have certain columns so should be filled with NA.
df1_1 = pd.DataFrame({'id':[1,1,2,2,3,3], 'age':[22,22,55,55,53,53], 'group':1,'y':[1,2,3,4,5,6]})
df1_2 = pd.DataFrame({'id':[1,1,2,2,3,3], 'age':[22,22,55,55,53,53], 'group':1,'w':[7,8,9,10,11,12]})
df2 = pd.DataFrame({'id':[4,4,5,5], 'age':[39,39,54,54], 'group':2,'y':[1,2,3,4]})
df2_2 = pd.DataFrame({'id':[4,4,5,5], 'age':[39,39,54,54], 'group':2,'w':[5,6,7,8]})
df3 = pd.DataFrame({'id':[1,1,6,6,7,7,8,8], 'age':[23,23,63,63,43,43,25,25],'group':3,'w':[1,2,3,4,5,6,7,8]})
Desired output:
id age group y w
1 22 1 1 7
1 22 1 2 8
2 55 1 3 9
2 55 1 4 10
3 53 1 5 11
3 53 1 6 12
4 39 2 1 5
4 39 2 2 6
5 54 2 3 7
5 54 2 4 8
1 23 3 NA 1
1 23 3 NA 2
6 63 3 NA 3
6 63 3 NA 4
7 43 3 NA 5
7 43 3 NA 6
8 25 3 NA 7
8 25 3 NA 8
I tried
from functools import reduce
dfs = [df1_1,df1_2,df2_1,df2_2,df3]
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['id','group','age'], how='outer'), dfs)
df_merged = pd.concat(dfs, join='outer', axis=0)
but none of my attempts worked