Numpy vectorization error

Question

I'm trying to vectorize the following equation. I have it working in non-vectorized form

print(K)
print(N)
print(gamma.shape)
print(means.shape)
print(img.shape)
print(N_k.shape)

yields

2
1000
(1000, 2)
(2,)
(1000)
(2,)



self.variances = np.sum(gamma * (img  - self.means) ** 2) / N_k

This gives me the wrong answer. The answer has the correct shape but the values are way off

However

self.variances = [ np.sum([gamma[n][k] * (img[n] - self.means[k]) ** 2 for n in range(0, N)]) / N_k[k] for k in range(0, K) ]

works. I'm new to numpy so I'm sure I've made a stupid mistake but I'm unable to find it in this case

Thanks

unutbu · Accepted Answer · 2016-04-03 15:13:35Z

3

Use

self.variances = np.sum(gamma * (img[:, None] - self.means) ** 2, axis=0) / N_k

instead of

self.variances = np.sum(gamma * (img - self.means) ** 2) / N_k

import numpy as np

N, K = 10, 20
gamma = np.random.random((N, K))
means = np.random.random(K)
N_k = np.random.random(K)
img = np.random.random(N)

expected = np.array([ np.sum([gamma[n][k] * (img[n] - means[k]) ** 2 for n in range(0, N)]) / N_k[k] for k in range(0, K) ])

result = np.sum(gamma * (img[:, None]  - means) ** 2, axis=0) / N_k

assert np.allclose(result, expected)

Note that img - self.means subtracts corresponding values elementwise. img has shape (N,) and self.means has shape (K,), so subtracting one from the other is possible if N == K, but raises an ValueError if N != K since then the shapes are incompatible. Since you didn't get an error, N must be equal to K.

Since you want to compute a 2D array of values, first make img a 2D array of shape (N, 1) by adding a new axis: img[:, None].

Then take advantage of broadcasting so that when you subtract the 1D array self.means, it will broadcast to a 2D array of compatible shape (N, K). Broadcasting by default adds new axes on the left. Thus the array of shape (K,) can automatically broadcast to an array of shape (N, K). (This also explains why we needed to use img[:, None] to explicitly add a new axis on the right.)

Now (img[:, None] - self.means) will be a 2D array of shape (N, K).

Also note that it is important to specify axis=0 when calling np.sum because the summation is done over the first axis of length N. Since Python uses 0-based indexing the first axis corresponds to axis=0.

If you don't specify an axis, then np.sum sums over all axes by default.

edited Apr 3, 2016 at 15:13

answered Apr 3, 2016 at 14:39

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Divakar Over a year ago

Or with np.einsum : diffs = img[:,None] - self_means ; out = np.einsum('ij,ij,ij->j',gamma,diffs,diffs)/N_k. Could be faster.

Shaun Over a year ago

Well it turns out that img is actually (N, 1) in shape. This code was enough to get me in the right direction. Thanks

Collectives™ on Stack Overflow

Numpy vectorization error

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related