Supposing that I have the following numpy array / pandas df:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| -- | -- | -- | -- | -- | -- | -- |
| 39 | 27 | 36 | 30 | 32 | 29 | 40 |
| 36 | 26 | 32 | 37 | 30 | 40 | 28 |
| 32 | 40 | 35 | 30 | 28 | 39 | 31 |
| 27 | 34 | 28 | 28 | 31 | 35 | 40 |
| 36 | 29 | 26 | 26 | 25 | 39 | 33 |
| 39 | 30 | 26 | 29 | 38 | 40 | 37 |
| 31 | 28 | 30 | 37 | 29 | 38 | 32 |
| 26 | 39 | 34 | 40 | 35 | 25 | 36 |
| 35 | 38 | 31 | 38 | 40 | 28 | 39 |
| 25 | 35 | 40 | 27 | 27 | 30 | 27 |
| 32 | 30 | 31 | 35 | 38 | 25 | 32 |
| 30 | 38 | 35 | 36 | 30 | 37 | 34 |
| 33 | 31 | 36 | 32 | 30 | 25 | 25 |
| 36 | 31 | 30 | 38 | 39 | 30 | 38 |
| 25 | 29 | 31 | 30 | 27 | 36 | 38 |
I want to run f(coli,colj) on each column pair so f(0,1), f(0,2), f(0,6), ..., f(6,6) and acheive a 6x6 array. I was able to acheive this realtively fast using a nested loop which is ok. The problem I have ran into is that I also need to compare the outcome of f(coli,colj) with itself so g(f(i,j), f(k,m)) which produces a 6x6x6x6 array. A 4D nested loops takes about a minute to run 😩.
for i in array.T:
for j in array.T:
for k in array.T:
for m in array.T:
output[i][j][k][l] = g(f(i, j), f(k, m))
Is there a faster way with broadcasting?
To rephrase the question, how would you perform a certain function by choosing all possible pairs from the column to create a 2D array and then choose all the pairs again from that array and repeat the same process. Hope that make sense 😊