Calcuations on column combinations in a numpy array

Question

Supposing that I have the following numpy array / pandas df:

| 0  | 1  | 2  | 3  | 4  | 5  | 6  |
| -- | -- | -- | -- | -- | -- | -- |
| 39 | 27 | 36 | 30 | 32 | 29 | 40 |
| 36 | 26 | 32 | 37 | 30 | 40 | 28 |
| 32 | 40 | 35 | 30 | 28 | 39 | 31 |
| 27 | 34 | 28 | 28 | 31 | 35 | 40 |
| 36 | 29 | 26 | 26 | 25 | 39 | 33 |
| 39 | 30 | 26 | 29 | 38 | 40 | 37 |
| 31 | 28 | 30 | 37 | 29 | 38 | 32 |
| 26 | 39 | 34 | 40 | 35 | 25 | 36 |
| 35 | 38 | 31 | 38 | 40 | 28 | 39 |
| 25 | 35 | 40 | 27 | 27 | 30 | 27 |
| 32 | 30 | 31 | 35 | 38 | 25 | 32 |
| 30 | 38 | 35 | 36 | 30 | 37 | 34 |
| 33 | 31 | 36 | 32 | 30 | 25 | 25 |
| 36 | 31 | 30 | 38 | 39 | 30 | 38 |
| 25 | 29 | 31 | 30 | 27 | 36 | 38 |

I want to run f(coli,colj) on each column pair so f(0,1), f(0,2), f(0,6), ..., f(6,6) and acheive a 6x6 array. I was able to acheive this realtively fast using a nested loop which is ok. The problem I have ran into is that I also need to compare the outcome of f(coli,colj) with itself so g(f(i,j), f(k,m)) which produces a 6x6x6x6 array. A 4D nested loops takes about a minute to run 😩.

for i in array.T:
    for j in array.T:
        for k in array.T:
            for m in array.T:
                output[i][j][k][l] = g(f(i, j), f(k, m))

Is there a faster way with broadcasting?

To rephrase the question, how would you perform a certain function by choosing all possible pairs from the column to create a 2D array and then choose all the pairs again from that array and repeat the same process. Hope that make sense 😊

Could you add a small example with input and output of what you want. — bvdl
– bvdl, Commented Jul 5, 2021 at 11:36

score 1 · Accepted Answer · 2021-07-05 11:58:44Z

1

To rephrase the question, how would you perform a certain function by choosing all possible pairs from the column to create a 2D array..

Assuming you have a dataframe df with shape (n, m), ie:

n, m = df.shape

Use np.mgrid to create indices (i, j) of all pairs of columns:

i, j = np.mgrid[:m, :m].reshape((2, -1))

(i and j now both have shape (m**2,))

..and then index your dataframe using i and j for each arg to f respectively:

f_res = f(df.iloc[:, i], df.iloc[:, j])

(f_res now has shape (n, m**2))

Now you can repeat the same but on "f_res" for the args to g:

i, j = np.mgrid[:m**2, :m**2].reshape((2, -1))

(i and j now both have shape (m**4,)

g_res = g(f_res.iloc[:, i], f_res.iloc[:, j])

(g_res now has shape (n, m**4))

If you want the result of f(.., ..) shaped (n, m, m) then do:

f_res_grid = f_res.values.reshape((-1, m, m))

And if you want the result of g(.., ..) shaped (n, m, m, m, m) then do likewise:

g_res_grid = g_res.values.reshape((n, m, m, m, m))

I hope you get the idea..

edited Jul 5, 2021 at 11:58

answered Jul 5, 2021 at 11:40

user9413641

Sign up to request clarification or add additional context in comments.

4 Comments

RebeccaKennedy Over a year ago

Even faster on numpy!

RebeccaKennedy Over a year ago

My nested 2D loop was running at 25 ms - with that is 12 ms!! I am gonna run the 4D now - should be good!

user9413641 Over a year ago

@RebeccaKennedy I'm glad I could help.

RebeccaKennedy Over a year ago

1.2 seconds which isn't bad at all! There is lots of redudant combinations which I can get rid of and that will hopefully improve things a bit more. Many thanks!

Collectives™ on Stack Overflow

Calcuations on column combinations in a numpy array

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related