Numpy operation for euclidean distance between multidimensional arrays

Question

I have two numpy arrays. 'A' of size w,h,2 and 'B' with n,2. In other words, A is a 2-dimensional array of 2D vectors while B is a 1D array of 2D vectors. What i want as a result is an array of size w,h,n. The last dimension is an n-dimensional vector where each of the components is the euclidean distance between the corresponding vector from A (denoted by the first two dimensions w and h) and the nth vector of B.

I know that i can just loop through w, h and n in python manually and calculate the distance for each element, but i like to know if there is a smart way to do that with numpy operations to increase performance.

I found some similar questions but unfortunately all of those use input arrays of the same dimensionality.

Divakar · Accepted Answer · 2016-12-18 13:49:43Z

1

Approach #1

You could reshape A to 2D, use Scipy's cdist that expects 2D arrays as inputs, get those euclidean distances and finally reshape back to 3D.

Thus, an implementation would be -

from scipy.spatial.distance import cdist

out = cdist(A.reshape(-1,2),B).reshape(w,h,-1)

Approach #2

Since, the axis of reduction is of length 2 only, we can just slice the input arrays to save memory on intermediate arrays, like so -

np.sqrt((A[...,0,None] - B[:,0])**2 + (A[...,1,None] - B[:,1])**2)

Explanation on A[...,0,None] and A[...,1,None] :

With that None we are just introducing a new axis at the end of sliced A. Well, let's take a small example -

In [54]: A = np.random.randint(0,9,(4,5,2))

In [55]: A[...,0].shape
Out[55]: (4, 5)

In [56]: A[...,0,None].shape
Out[56]: (4, 5, 1)

In [57]: B = np.random.randint(0,9,(3,2))

In [58]: B[:,0].shape
Out[58]: (3,)

So, we have :

A[...,0,None]  : 4 x 5 x 1 
B[:,0]         :         3

That is essentially :

A[...,0,None]  : 4 x 5 x 1 
B[:,0]         : 1 x 1 x 3

When the subtraction is performed, the singleton dims are broadcasted corresponding to the dimensions of the other participating arrays -

A[...,0,None] - B  : 4 x 5 x 3

We repeat this for the second index along the last axis. We add these two arrays after squaring and finally a square-root to get the final eucl. distances.

edited Dec 18, 2016 at 13:49

answered Dec 18, 2016 at 13:31

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Mario Dekena Over a year ago

Wow. First approach is actually quite easy. Didn't think of that! Regarding approach 2, i am fairly new to numpy, could you elaborate on the slice and index operations? The [...,None,1] thing is really confusing.

Divakar Over a year ago

@MarioDekena A[...,0,None] would look more intuitive to what it essentially does. Edited with few comments.

Mario Dekena Over a year ago

Thank you very much! Seems like i still have a lot to learn regarding numpy ;)

Divakar Over a year ago

@MarioDekena Just curious how fast is approach#2 compared to app#1 on your use-case?

Mario Dekena Over a year ago

Gonna do some performance testing soon. I will post the results.

|

Collectives™ on Stack Overflow

Numpy operation for euclidean distance between multidimensional arrays

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related