This is for a K-Means Algorithm. This is for homework, so I do not want to use the built in Kmeans function. I have 2 numpy arrays. One is of centroids. The other is of data points. I am trying to find the distance from each of the centroids to each of the data points. I don't know how to pass the arrays to my function in order for it to print. I want to end up with as many arrays of distances as there are centroids. Then I can compare each distance in the arrays, choose the smallest distance and assign that point to one of the clusters. Then find the mean of each of the clusters, and those numbers become my new centroids.
import numpy as np
centroids = np.array([[3,44],[5,15]])
dataPoints = np.array([[2,4],[17,4],[45,2],[45,7],[16,32],[32,14],[20,56],[68,33]])
def distance(a,b):
for x in a: #for each point in centroids array
for y in b:#for each point in the dataPoints array
print np.sqrt((a[0] - b[0])**2 + (a[1] - b[1])**2)#print the distance
distance (randPoints, dataPoints)#call the function with the data
The output I am getting:
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
[ 12.04159458 41.48493703]
What am I doing that is obviously wrong here? I should end up with 2 different arrays with 8 distances each.
aandb. Usex[i]andy[i]insteada[i]andb[i].print(np.sqrt(((dataPoints-x)**2).sum(axis=1))).