Just do this:
pd.concat([df1,df2], axis=1).sum(axis=1)
pd.concat will merge the 2(or more) frames and match based on index. sum(axis=1) just sums across the rows.
Here's the working example:
#create the example data
df1 = pd.DataFrame({'index':[0,1,2,4],'value':[10,12,15,20]}).set_index('index')
df2 = pd.DataFrame({'index':[0,1,3,4],'value':[10,10,10,10]}).set_index('index')
The above will give you:
In [7]: pd.concat([df1,df2],axis=1).sum(axis=1)
Out[7]:
index
0 20.0
1 22.0
2 15.0
3 10.0
4 30.0
dtype: float64
Timing Update
A comment wanted to see which solution is faster. tldr, using .add() is about 2x faster than pd.concat, but if you use .align explicitly, it'll be even faster, albeit marginally
# create example dataframe
N=1000000
np.random.seed(666)
df1 = pd.DataFrame({'A':np.random.randint(0,1000, size=N),"B":np.random.randint(0,1000, size=N)}, index=np.arange(0,N,1))
df2 = pd.DataFrame({'A':np.random.randint(0,1000, size=N),"B":np.random.randint(0,1000, size=N)}, index=np.arange(0,N,1))
# punch a few holes to mismatch integers
df1 = df1.loc[np.random.choice(df1.index,int(.8*N), replace=False),:].sort_index()
df2 = df2.loc[np.random.choice(df1.index,int(.8*N), replace=False),:].sort_index()
# define functions
def summatch1(x,y):
return pd.concat([x,y],axis=1).sum(axis=1)
def summatch2(x,y):
return x.add(y,fill_value=0)
def summatch3(x,y):
z = x.merge(y,how='outer',suffixes=['1','2'],left_index=True, right_index=True).fillna(0)
return z.A1+z.A2
def summatch4(x,y):
p1,p2 = x.align(y,join='outer', fill_value=0)
return p1+p2
# time them
%timeit summatch1(df1.A, df2.A)
%timeit summatch2(df1.A, df2.A)
%timeit summatch3(df1.loc[:,['A']], df2.loc[:,['A']])
%timeit summatch4(df1.A, df2.A)
Results
Times are reported in the following order:
pd.concat
df1.add
pd.merge
df1.align
800k rows in df1 and df2
79.7 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
36.6 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.2 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
31.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
80k rows in df1 and df2
7.2 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.08 ms ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.14 ms ± 43.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.76 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)