I need to make my code faster. The problem is very simple, but I'm not finding a good way to make the calculation without looping through the whole DataFrame.
I've got three dataFrames: A, B and C.
A and B have 3 columns each, and the following format:
A (10 rows):
Canal Gerencia grad
0 'ABC' 'DEF' 23
etc...
B (25 rows):
Marca Formato grad
0 'GHI' 'JKL' 43
etc...
DataFrame C, on the other hand, has 5 columns:
C (5000 rows):
Marca Formato Canal Gerencia grad
0 'GHI' 'JKL' 'ABC' 'DEF' -102
etc...
I need a vector with the same length of DataFrame 'C' that adds up the values of 'grad' from the three tables, for example:
m = 'GHI'
f = 'JKL'
c = 'ABC'
g = 'DEF'
res = C['grad'][C['Marca']==m][C['Formato']==f][C['Canal']==c][C['Gerencia']==g] + A['grad'][A['Canal']==c][A['Gerencia']==g] + B['grad'][B['Formato']==f][B['Marca']==m]
>>-36
I tried looping through the C dataFrame, but is too slow. I understand I should try to avoid the loop through the dataFrame, but don't know how to do this. My actual code is the following (works, but VERY slow):
res=[]
for row_index, row in C.iterrows():
vec1 = A['Gerencia']==row['Gerencia']
vec2 = A['Canal']==row['Canal']
vec3 = B['Marca']==row['Marca']
vec4 = B['Formato']==row['Formato']
grad = row['grad']
res.append(grad + sum(A['grad'][vec1][vec2])+ sum(B['grad'][vec3][vec4]))
I would really appreciate any help on making this routine quicker. Thank you!