Pythonic way to calculate distance using numpy matrices?

Question

I have a list of points in a numpy matrix,

A = [[x11,x12,x13],[x21,x22,x23] ]

and I have a point origin o= [o1,o2,o3] from which I have to compute distance for every point,

A - o will subtract o from every point. Currently I have to do the square of every attribute and addition operation, I am doing in the for loop. Is there a more intuitive way to do this?

P.S: I am doing the above calculation as port of kmeans clustering application. I have computed centroids and now I have to computer distance for every point from the centroid.

input_mat = input_data_per_minute.values[:,2:5]

scaled_input_mat = scale2(input_mat)

k_means = cluster.KMeans(n_clusters=5)

print 'training start'
k_means.fit(scaled_input_mat)
print 'training over'

out = k_means.cluster_centers_

I have to compute the distance between input_mat and each cluster centroid.

Look into cdist from scipy.

Divakar
– Divakar

2016-03-12 08:12:31 +00:00
Commented Mar 12, 2016 at 8:12 — Divakar
– Divakar, Commented Mar 12, 2016 at 8:12

MSeifert · Accepted Answer · 2016-03-14 03:06:39Z

Numpy solution:

Numpy is great with broadcasting so you can trick it to do all distances in one step. But it will consume a lot of memory depending on the number of points and cluster centers. In fact it will create a number_of_points * number_of_cluster_centers * 3 array:

First you need to know a bit about broadcasting, I'll play it self and define each dimension by hand.

I'll start by defining some points and centers for illustration purposes:

import numpy as np

points = np.array([[1,1,1],
                   [2,1,1],
                   [1,2,1],
                   [5,5,5]])

centers = np.array([[1.5, 1.5, 1],
                    [5,5,5]])

Now I'll prepare these arrays so that I can use numpy broadcasting to get the distance in each dimension:

distance_3d = points[:,None,:] - centers[None,:,:]

Effectivly the first dimension is now the points "label", the second dimension is the centers "label" and the third dimension is the coordinate. The subtraction is to get the distance in each dimension. The result will have a shape:

(number_of_points, number_of_cluster_centers, 3)

now it's only a matter of applying the formula of the euclidean distance:

# Square each distance
distance_3d_squared = distance_3d ** 2

# Take the sum of each coordinates distance (the result will be 2D)
distance_sum = np.sum(distance_3d_squared, axis=2)

# And take the square root
distance = np.sqrt(distance_sum)

For my test data the final result is:

#array([[ 0.70710678,  6.92820323],
#       [ 0.70710678,  6.40312424],
#       [ 0.70710678,  6.40312424],
#       [ 6.36396103,  0.        ]])

So the distance[i, j] element will give you the distance of point i to the center j.

Summary:

You can put all of this in one-line:

distance2 = np.sqrt(np.sum((points[:,None,:] - centers[None,:,:]) ** 2, axis=2))

Scipy solution (faster & shorter):

or if you have scipy use cdist:

from scipy.spatial.distance import cdist
distance3 = cdist(points, centers)

The result will always be the same but cdist is the fastest for lots of points and centers.

willnx · Accepted Answer · 2016-03-12 08:26:38Z

0

You should be able to do something like this: (assuming I read your question right ;) )

In [1]: import numpy as np

In [2]: a = np.array([[11,12,13],[21,22,23]])

In [3]: o = [1,2,3]

In [4]: a - o  # just showing
Out[4]: 
array([[10, 10, 10],
       [20, 20, 20]])

In [5]: a ** 2  # just showing
Out[5]: 
array([[121, 144, 169],
       [441, 484, 529]])

In [6]: b = (a ** 2) + (a - o)

In [7]: b
Out[7]: 
array([[131, 154, 179],
       [461, 504, 549]])

Numpy is great because it moves through the array element-wise! This means that 90+% of the time you can iterate the array without a for-loop. Using a for-loop outside of the array also significantly slower.

answered Mar 12, 2016 at 8:26

willnx

1,2931 gold badge8 silver badges16 bronze badges

Collectives™ on Stack Overflow

Pythonic way to calculate distance using numpy matrices?

2 Answers 2

Numpy solution:

Summary:

Scipy solution (faster & shorter):

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Numpy solution:

Summary:

Scipy solution (faster & shorter):

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related