Efficient calculation on a pandas dataframe

Question

I need to make my code faster. The problem is very simple, but I'm not finding a good way to make the calculation without looping through the whole DataFrame.

I've got three dataFrames: A, B and C.

A and B have 3 columns each, and the following format:

A (10 rows):

     Canal Gerencia grad
0    'ABC'   'DEF'   23
etc...

B (25 rows):

     Marca  Formato  grad
0    'GHI'   'JKL'    43
etc...

DataFrame C, on the other hand, has 5 columns:

C (5000 rows):

     Marca  Formato  Canal  Gerencia  grad
0    'GHI'   'JKL'    'ABC'   'DEF'   -102
etc...

I need a vector with the same length of DataFrame 'C' that adds up the values of 'grad' from the three tables, for example:

m = 'GHI'
f = 'JKL'
c = 'ABC'
g = 'DEF'
res = C['grad'][C['Marca']==m][C['Formato']==f][C['Canal']==c][C['Gerencia']==g] + A['grad'][A['Canal']==c][A['Gerencia']==g] + B['grad'][B['Formato']==f][B['Marca']==m]
>>-36

I tried looping through the C dataFrame, but is too slow. I understand I should try to avoid the loop through the dataFrame, but don't know how to do this. My actual code is the following (works, but VERY slow):

res=[]
for row_index, row in C.iterrows():
    vec1 = A['Gerencia']==row['Gerencia']
    vec2 = A['Canal']==row['Canal']
    vec3 = B['Marca']==row['Marca']
    vec4 = B['Formato']==row['Formato']
    grad = row['grad']
    res.append(grad + sum(A['grad'][vec1][vec2])+ sum(B['grad'][vec3][vec4]))

I would really appreciate any help on making this routine quicker. Thank you!

unutbu · Accepted Answer · 2015-07-04 01:23:34Z

4

IIUC, you need to merge C with A:

C = pd.merge(C, A, on=['Canal', 'Gerencia'])

(this will add a column to it) and then merge the result with B:

C = pd.merge(C, B, on=['Marca', 'Formato'])

(again adding a column to C)

At this point, check C for the names of the columns; say they are grad_foo, grad_bar, grad_baz. So just add them

C.grad_foo + C.grad_bar + C.grad_baz

edited Jul 4, 2015 at 1:23

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

answered Jul 3, 2015 at 20:40

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient calculation on a pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related