implementing euclidean distance based formula using numpy

Question

I am trying to implement this formula in python using numpy

As you can see in picture above X is numpy matrix and each xi is a vector with n dimensions and C is also a numpy matrix and each Ci is vector with n dimensions too, dist(Ci,xi) is euclidean distance between these two vectors. I implement a code in python:

value = 0
for i in range(X.shape[0]):
    min_value = math.inf
    #this for loop iterate k times
    for j in range(C.shape[0]):
        distance = (np.dot(X[i] - C[j],
                           X[i] - C[j])) ** .5
        min_value = min(min_value, distance)
    value += min_value
fitnessValue = value

But my code performance is not good enough I'am looking for faster,is there any faster way to calculate that formula in python any idea would be thankful.

Depending on the size of X and C, you might get better performance by first building a KD-Tree. Building the tree is O(n log n). Finding the nearest point in the KDTree is O(log n). — unutbu
– unutbu, Commented Dec 8, 2016 at 16:04
Scipy is your friend: docs.scipy.org/doc/scipy-0.14.0/reference/… — Benjamin
– Benjamin, Commented Dec 8, 2016 at 16:04

PAb · Accepted Answer · 2016-12-08 16:12:58Z

Generally, loops running an important number of times should be avoided when possible in python.

Here, there exists a scipy function, scipy.spatial.distance.cdist(C, X), which computes the pairwise distance matrix between C and X. That is to say, if you call distance_matrix = scipy.spatial.distance.cdist(C, X), you have distance_matrix[i, j] = dist(C_i, X_j).

Then, for each j, you want to compute the minimum of the dist(C_i, X_j) over all i. You do not either need a loop to compute this! The function numpy.minimum does it for you, if you pass an axis argument.

And finally, the summation of all these minimum is done by calling the numpy.sum function.

This gives code much more readable and faster:

import scipy.spatial.distance
import numpy as np
def your_function(C, X):
    distance_matrix = scipy.spatial.distance.cdist(C, X)
    minimum = np.min(distance_matrix, axis=0)
    return np.sum(minimum)

Which returns the same results as your function :) Hope this helps!

NaN · Accepted Answer · 2016-12-08 22:16:58Z

einsum can also be called into play. Here is a simple small example of a pairwise distance calculation for a small set. Useful if you don't have scipy installed and/or wish to use numpy solely.

>>> a
array([[ 0.,  0.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.],
       [ 4.,  4.]])
>>> b = a.reshape(np.prod(a.shape[:-1]),1,a.shape[-1])
>>> b
array([[[ 0.,  0.]],

       [[ 1.,  1.]],

       [[ 2.,  2.]],

       [[ 3.,  3.]],

       [[ 4.,  4.]]])
>>> diff =  a - b;  dist_arr = np.sqrt(np.einsum('ijk,ijk->ij', diff, diff)).squeeze()
>>> dist_arr
array([[ 0.     ,  1.41421,  2.82843,  4.24264,  5.65685],
       [ 1.41421,  0.     ,  1.41421,  2.82843,  4.24264],
       [ 2.82843,  1.41421,  0.     ,  1.41421,  2.82843],
       [ 4.24264,  2.82843,  1.41421,  0.     ,  1.41421],
       [ 5.65685,  4.24264,  2.82843,  1.41421,  0.     ]])

Array 'a' is a simple 2d (shape=(5,2), 'b' is just 'a' reshaped to facilitate (5, 1, 2) the difference calculations for the cdist style array. The terms are written verbosely since they are extracted from other code. the 'diff' variable is the difference array and the dist_arr shown is for the 'euclidean' distance. Should you need euclideansq (square distance) for 'closest' determinations, simply remove the np.sqrt term and finally squeeze, just removes and 1 terms in the shape.

cdist is faster for much larger arrays (in the order of 1000s of origins and destinations) but einsum is a nice alternative and well documented by others on this site.

Collectives™ on Stack Overflow

implementing euclidean distance based formula using numpy

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related