9

I'm trying to figure out if Panda's, when adding two series together, automatically matches on index or if it simply adds by element position. If its just by position, is there a way to get it to add on index? I've looked at merging, but I'm not very clear if the key in this case can be the index of both...

For example, if I have do DFs:

df1 = index  value
        0      10
        1      12
        2      15
        4      20


df2 = index  value
        0      10
        1      10
        3      10
        4      10

and I want to add df1[total] = df1[value] + df2[value] =

df1 = index  value
        0      20
        1      22
        2      15
        3      10
        4      30

Thanks for your help in advance!

2 Answers 2

6

Because of the intrinsic data alignment in pandas, you can use add with fill_value=0 and it will sum these two series based on index alignment.

df1.add(df2,fill_value=0)

Input:

df1 = pd.Series([10]*4,index=[0,1,3,4])

df2 = pd.Series([10,12,15,20], index=[0,1,2,4])

df1.add(df2,fill_value=0)

Output:

0    20.0
1    22.0
2    15.0
3    10.0
4    30.0
dtype: float64
Sign up to request clarification or add additional context in comments.

Comments

4

Just do this:

pd.concat([df1,df2], axis=1).sum(axis=1)

pd.concat will merge the 2(or more) frames and match based on index. sum(axis=1) just sums across the rows.

Here's the working example:

#create the example data
df1 = pd.DataFrame({'index':[0,1,2,4],'value':[10,12,15,20]}).set_index('index')
df2 = pd.DataFrame({'index':[0,1,3,4],'value':[10,10,10,10]}).set_index('index')

The above will give you:

In [7]: pd.concat([df1,df2],axis=1).sum(axis=1)
Out[7]:
index
0    20.0
1    22.0
2    15.0
3    10.0
4    30.0
dtype: float64

Timing Update

A comment wanted to see which solution is faster. tldr, using .add() is about 2x faster than pd.concat, but if you use .align explicitly, it'll be even faster, albeit marginally

# create example dataframe
N=1000000
np.random.seed(666)
df1 = pd.DataFrame({'A':np.random.randint(0,1000, size=N),"B":np.random.randint(0,1000, size=N)}, index=np.arange(0,N,1))
df2 = pd.DataFrame({'A':np.random.randint(0,1000, size=N),"B":np.random.randint(0,1000, size=N)}, index=np.arange(0,N,1))

# punch a few holes to mismatch integers
df1 = df1.loc[np.random.choice(df1.index,int(.8*N), replace=False),:].sort_index()
df2 = df2.loc[np.random.choice(df1.index,int(.8*N), replace=False),:].sort_index()

# define functions
def summatch1(x,y):
    return pd.concat([x,y],axis=1).sum(axis=1)

def summatch2(x,y):
    return x.add(y,fill_value=0)

def summatch3(x,y):
    z = x.merge(y,how='outer',suffixes=['1','2'],left_index=True, right_index=True).fillna(0)
    return z.A1+z.A2

def summatch4(x,y):
    p1,p2 = x.align(y,join='outer', fill_value=0)
    return p1+p2

# time them
%timeit summatch1(df1.A, df2.A)
%timeit summatch2(df1.A, df2.A)
%timeit summatch3(df1.loc[:,['A']], df2.loc[:,['A']])
%timeit summatch4(df1.A, df2.A)

Results

Times are reported in the following order:

  1. pd.concat
  2. df1.add
  3. pd.merge
  4. df1.align

800k rows in df1 and df2

79.7 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
36.6 ms ± 1.41 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
40.2 ms ± 1.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
31.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

80k rows in df1 and df2

7.2 ms ± 221 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4.08 ms ± 90.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5.14 ms ± 43.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.76 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

2 Comments

Will this be faster or slow than @scott Boston answer of adding two series with pad value=0?
@limKaizhuo not exactly timely (pun not intended) but I timed the various options. .add is faster but using .align is even faster, albeit marginally.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.