0

Supposing that I have the following numpy array / pandas df:

| 0  | 1  | 2  | 3  | 4  | 5  | 6  |
| -- | -- | -- | -- | -- | -- | -- |
| 39 | 27 | 36 | 30 | 32 | 29 | 40 |
| 36 | 26 | 32 | 37 | 30 | 40 | 28 |
| 32 | 40 | 35 | 30 | 28 | 39 | 31 |
| 27 | 34 | 28 | 28 | 31 | 35 | 40 |
| 36 | 29 | 26 | 26 | 25 | 39 | 33 |
| 39 | 30 | 26 | 29 | 38 | 40 | 37 |
| 31 | 28 | 30 | 37 | 29 | 38 | 32 |
| 26 | 39 | 34 | 40 | 35 | 25 | 36 |
| 35 | 38 | 31 | 38 | 40 | 28 | 39 |
| 25 | 35 | 40 | 27 | 27 | 30 | 27 |
| 32 | 30 | 31 | 35 | 38 | 25 | 32 |
| 30 | 38 | 35 | 36 | 30 | 37 | 34 |
| 33 | 31 | 36 | 32 | 30 | 25 | 25 |
| 36 | 31 | 30 | 38 | 39 | 30 | 38 |
| 25 | 29 | 31 | 30 | 27 | 36 | 38 |

I want to run f(coli,colj) on each column pair so f(0,1), f(0,2), f(0,6), ..., f(6,6) and acheive a 6x6 array. I was able to acheive this realtively fast using a nested loop which is ok. The problem I have ran into is that I also need to compare the outcome of f(coli,colj) with itself so g(f(i,j), f(k,m)) which produces a 6x6x6x6 array. A 4D nested loops takes about a minute to run 😩.

for i in array.T:
    for j in array.T:
        for k in array.T:
            for m in array.T:
                output[i][j][k][l] = g(f(i, j), f(k, m))

Is there a faster way with broadcasting?

To rephrase the question, how would you perform a certain function by choosing all possible pairs from the column to create a 2D array and then choose all the pairs again from that array and repeat the same process. Hope that make sense 😊

1
  • Could you add a small example with input and output of what you want. Commented Jul 5, 2021 at 11:36

1 Answer 1

1

To rephrase the question, how would you perform a certain function by choosing all possible pairs from the column to create a 2D array..

Assuming you have a dataframe df with shape (n, m), ie:

n, m = df.shape

Use np.mgrid to create indices (i, j) of all pairs of columns:

i, j = np.mgrid[:m, :m].reshape((2, -1))

(i and j now both have shape (m**2,))

..and then index your dataframe using i and j for each arg to f respectively:

f_res = f(df.iloc[:, i], df.iloc[:, j])

(f_res now has shape (n, m**2))

Now you can repeat the same but on "f_res" for the args to g:

i, j = np.mgrid[:m**2, :m**2].reshape((2, -1))

(i and j now both have shape (m**4,)

g_res = g(f_res.iloc[:, i], f_res.iloc[:, j])

(g_res now has shape (n, m**4))

If you want the result of f(.., ..) shaped (n, m, m) then do:

f_res_grid = f_res.values.reshape((-1, m, m))

And if you want the result of g(.., ..) shaped (n, m, m, m, m) then do likewise:

g_res_grid = g_res.values.reshape((n, m, m, m, m))

I hope you get the idea..

Sign up to request clarification or add additional context in comments.

4 Comments

Even faster on numpy!
My nested 2D loop was running at 25 ms - with that is 12 ms!! I am gonna run the 4D now - should be good!
@RebeccaKennedy I'm glad I could help.
1.2 seconds which isn't bad at all! There is lots of redudant combinations which I can get rid of and that will hopefully improve things a bit more. Many thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.