Numpy solution:
Numpy is great with broadcasting so you can trick it to do all distances in one step. But it will consume a lot of memory depending on the number of points and cluster centers. In fact it will create a number_of_points * number_of_cluster_centers * 3 array:
First you need to know a bit about broadcasting, I'll play it self and define each dimension by hand.
I'll start by defining some points and centers for illustration purposes:
import numpy as np
points = np.array([[1,1,1],
[2,1,1],
[1,2,1],
[5,5,5]])
centers = np.array([[1.5, 1.5, 1],
[5,5,5]])
Now I'll prepare these arrays so that I can use numpy broadcasting to get the distance in each dimension:
distance_3d = points[:,None,:] - centers[None,:,:]
Effectivly the first dimension is now the points "label", the second dimension is the centers "label" and the third dimension is the coordinate. The subtraction is to get the distance in each dimension. The result will have a shape:
(number_of_points, number_of_cluster_centers, 3)
now it's only a matter of applying the formula of the euclidean distance:
# Square each distance
distance_3d_squared = distance_3d ** 2
# Take the sum of each coordinates distance (the result will be 2D)
distance_sum = np.sum(distance_3d_squared, axis=2)
# And take the square root
distance = np.sqrt(distance_sum)
For my test data the final result is:
#array([[ 0.70710678, 6.92820323],
# [ 0.70710678, 6.40312424],
# [ 0.70710678, 6.40312424],
# [ 6.36396103, 0. ]])
So the distance[i, j] element will give you the distance of point i to the center j.
Summary:
You can put all of this in one-line:
distance2 = np.sqrt(np.sum((points[:,None,:] - centers[None,:,:]) ** 2, axis=2))
Scipy solution (faster & shorter):
or if you have scipy use cdist:
from scipy.spatial.distance import cdist
distance3 = cdist(points, centers)
The result will always be the same but cdist is the fastest for lots of points and centers.
cdistfromscipy.