1

I'm trying to vectorize the following equation. I have it working in non-vectorized form

Variances

print(K)
print(N)
print(gamma.shape)
print(means.shape)
print(img.shape)
print(N_k.shape)

yields

2
1000
(1000, 2)
(2,)
(1000)
(2,)



self.variances = np.sum(gamma * (img  - self.means) ** 2) / N_k

This gives me the wrong answer. The answer has the correct shape but the values are way off

However

self.variances = [ np.sum([gamma[n][k] * (img[n] - self.means[k]) ** 2 for n in range(0, N)]) / N_k[k] for k in range(0, K) ]

works. I'm new to numpy so I'm sure I've made a stupid mistake but I'm unable to find it in this case

Thanks

0

1 Answer 1

3

Use

self.variances = np.sum(gamma * (img[:, None] - self.means) ** 2, axis=0) / N_k

instead of

self.variances = np.sum(gamma * (img - self.means) ** 2) / N_k

import numpy as np

N, K = 10, 20
gamma = np.random.random((N, K))
means = np.random.random(K)
N_k = np.random.random(K)
img = np.random.random(N)

expected = np.array([ np.sum([gamma[n][k] * (img[n] - means[k]) ** 2 for n in range(0, N)]) / N_k[k] for k in range(0, K) ])

result = np.sum(gamma * (img[:, None]  - means) ** 2, axis=0) / N_k

assert np.allclose(result, expected)

Note that img - self.means subtracts corresponding values elementwise. img has shape (N,) and self.means has shape (K,), so subtracting one from the other is possible if N == K, but raises an ValueError if N != K since then the shapes are incompatible. Since you didn't get an error, N must be equal to K.

Since you want to compute a 2D array of values, first make img a 2D array of shape (N, 1) by adding a new axis: img[:, None].

Then take advantage of broadcasting so that when you subtract the 1D array self.means, it will broadcast to a 2D array of compatible shape (N, K). Broadcasting by default adds new axes on the left. Thus the array of shape (K,) can automatically broadcast to an array of shape (N, K). (This also explains why we needed to use img[:, None] to explicitly add a new axis on the right.)

Now (img[:, None] - self.means) will be a 2D array of shape (N, K).


Also note that it is important to specify axis=0 when calling np.sum because the summation is done over the first axis of length N. Since Python uses 0-based indexing the first axis corresponds to axis=0.

If you don't specify an axis, then np.sum sums over all axes by default.

Sign up to request clarification or add additional context in comments.

2 Comments

Or with np.einsum : diffs = img[:,None] - self_means ; out = np.einsum('ij,ij,ij->j',gamma,diffs,diffs)/N_k. Could be faster.
Well it turns out that img is actually (N, 1) in shape. This code was enough to get me in the right direction. Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.