I have a 2D matrix of values. Each row is a data point.
data = np.array(
[[2, 2, 3],
[4, 2, 4],
[1, 1, 4]])
Now if my test point is a single 1D numpy array like:
test = np.array([2,3,3])
I can do something simple like np.sqrt(np.sum((test-data)**2,axis=1)) to calculate the distance of the test point relative to all three data points.
However, if test is itself a 2D array of points to be tested, the above doesn't work and I been using something like:
test = np.array([[2,3,3],[4,1,2]])
for i in range(len(test)):
print np.sqrt(np.sum((test[i]-data)**2,axis=1))
>>> [ 1. 2.44948974 2.44948974]
[ 2.44948974 2.23606798 3.60555128]
In order to calculate each point in my Test set against all the points in the Data set. It seems like there should be a way to vectorize this whole operation so that I get a (2,3) matrix of corresponding distances back without the outer FOR loop
(Note: While this particular example is about Euclidean Distance, I find myself with similar type operations where I would like to perform an operation on all elements of one matrix with the individual elements of another matrix, so I'm hoping there's a generalized way to set up problems of this nature using Numpy.)
print np.reshape(np.sqrt(np.sum((np.reshape(np.repeat(test, len(data), axis=0), (len(test) * len(data), Xdims)) - ml.repmat(data, 2, 1)) ** 2, axis=1)), (2, len(data))).Tfrom scipy.spatial.distance import cdist ; out = cdist(test,data). It's super efficient.