1

I am trying to implement this formula in python using numpy

enter image description here

As you can see in picture above X is numpy matrix and each xi is a vector with n dimensions and C is also a numpy matrix and each Ci is vector with n dimensions too, dist(Ci,xi) is euclidean distance between these two vectors. I implement a code in python:

value = 0
for i in range(X.shape[0]):
    min_value = math.inf
    #this for loop iterate k times
    for j in range(C.shape[0]):
        distance = (np.dot(X[i] - C[j],
                           X[i] - C[j])) ** .5
        min_value = min(min_value, distance)
    value += min_value
fitnessValue = value

But my code performance is not good enough I'am looking for faster,is there any faster way to calculate that formula in python any idea would be thankful.

2
  • Depending on the size of X and C, you might get better performance by first building a KD-Tree. Building the tree is O(n log n). Finding the nearest point in the KDTree is O(log n). Commented Dec 8, 2016 at 16:04
  • Scipy is your friend: docs.scipy.org/doc/scipy-0.14.0/reference/… Commented Dec 8, 2016 at 16:04

2 Answers 2

5

Generally, loops running an important number of times should be avoided when possible in python.

Here, there exists a scipy function, scipy.spatial.distance.cdist(C, X), which computes the pairwise distance matrix between C and X. That is to say, if you call distance_matrix = scipy.spatial.distance.cdist(C, X), you have distance_matrix[i, j] = dist(C_i, X_j).

Then, for each j, you want to compute the minimum of the dist(C_i, X_j) over all i. You do not either need a loop to compute this! The function numpy.minimum does it for you, if you pass an axis argument.

And finally, the summation of all these minimum is done by calling the numpy.sum function.

This gives code much more readable and faster:

import scipy.spatial.distance
import numpy as np
def your_function(C, X):
    distance_matrix = scipy.spatial.distance.cdist(C, X)
    minimum = np.min(distance_matrix, axis=0)
    return np.sum(minimum)

Which returns the same results as your function :) Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

1

einsum can also be called into play. Here is a simple small example of a pairwise distance calculation for a small set. Useful if you don't have scipy installed and/or wish to use numpy solely.

>>> a
array([[ 0.,  0.],
       [ 1.,  1.],
       [ 2.,  2.],
       [ 3.,  3.],
       [ 4.,  4.]])
>>> b = a.reshape(np.prod(a.shape[:-1]),1,a.shape[-1])
>>> b
array([[[ 0.,  0.]],

       [[ 1.,  1.]],

       [[ 2.,  2.]],

       [[ 3.,  3.]],

       [[ 4.,  4.]]])
>>> diff =  a - b;  dist_arr = np.sqrt(np.einsum('ijk,ijk->ij', diff, diff)).squeeze()
>>> dist_arr
array([[ 0.     ,  1.41421,  2.82843,  4.24264,  5.65685],
       [ 1.41421,  0.     ,  1.41421,  2.82843,  4.24264],
       [ 2.82843,  1.41421,  0.     ,  1.41421,  2.82843],
       [ 4.24264,  2.82843,  1.41421,  0.     ,  1.41421],
       [ 5.65685,  4.24264,  2.82843,  1.41421,  0.     ]])

Array 'a' is a simple 2d (shape=(5,2), 'b' is just 'a' reshaped to facilitate (5, 1, 2) the difference calculations for the cdist style array. The terms are written verbosely since they are extracted from other code. the 'diff' variable is the difference array and the dist_arr shown is for the 'euclidean' distance. Should you need euclideansq (square distance) for 'closest' determinations, simply remove the np.sqrt term and finally squeeze, just removes and 1 terms in the shape.

cdist is faster for much larger arrays (in the order of 1000s of origins and destinations) but einsum is a nice alternative and well documented by others on this site.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.