Efficient nested looping with pandas dataframe

Question

I have a simple panda dataframe like this one:

d = {'col1': ['a','b','c','d','e'], 'col2': [1,2,3,4,5]}
df = pd.DataFrame(d)
df
  col1  col2
0    a     1
1    b     2
2    c     3
3    d     4
4    e     5

And I would need to iterate over it and to get a simple arithmetic results (like a product or so) for all combination of row values. I was thinking to make a matrix and put the values in, like this:

size = df.shape[0]
mtx = np.zeros(shape=(size, size))
mtx
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

But I somehow 'sense' there is more efficient way to do this than nested looping, like this:

for index1, c11, c12, in df.itertuples():
    for index2, c21, c22 in df.itertuples():
        mtx[index1][index2] = float(c12) * float(c22)

mtx
array([[  1.,   2.,   3.,   4.,   5.],
       [  2.,   4.,   6.,   8.,  10.],
       [  3.,   6.,   9.,  12.,  15.],
       [  4.,   8.,  12.,  16.,  20.],
       [  5.,  10.,  15.,  20.,  25.]])

Any idea will be much appreciated! Thanks!

Miriam Farber · Accepted Answer · 2017-03-29 10:53:30Z

3

For oprations like *,+,-,/ you can do the following: (this example is for *, but you can just change the operation in the last row if you want +,- or /)

import numpy as np
import pandas as pd
d = {'col1': ['a','b','c','d','e'], 'col2': [1,2,3,4,5]}
df = pd.DataFrame(d)
a=np.array([df.col2.tolist()])
a.T*a

The result is:

array([[ 1,  2,  3,  4,  5],
   [ 2,  4,  6,  8, 10],
   [ 3,  6,  9, 12, 15],
   [ 4,  8, 12, 16, 20],
   [ 5, 10, 15, 20, 25]], dtype=int64)

Change a.T*a to a.T+a for pairwise sum, and to a.T-a for pairwise difference. If you want pairwise division, you can change it into a.T/a, but remember to include the line a=a.astype(float) above the operation.

edited Mar 29, 2017 at 10:53

answered Mar 29, 2017 at 10:35

Miriam Farber

19.7k15 gold badges65 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Efficient nested looping with pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related