The following is a very simple implementation of the k-means algorithm.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
DIM = 2
N = 2000
num_cluster = 4
iterations = 3
x = np.random.randn(N, DIM)
y = np.random.randint(0, num_cluster, N)
mean = np.zeros((num_cluster, DIM))
for t in range(iterations):
for k in range(num_cluster):
mean[k] = np.mean(x[y==k], axis=0)
for i in range(N):
dist = np.sum((mean - x[i])**2, axis=1)
pred = np.argmin(dist)
y[i] = pred
for k in range(num_cluster):
plt.scatter(x[y==k,0], x[y==k,1])
plt.show()
Here are two example outputs the code produces:
The first example (num_cluster = 4) looks as expected. The second example (num_cluster = 11) however shows only on cluster which is clearly not what I wanted. The code works depending on the number of classes I define and the number of iterations.
So far, I couldn't find the bug in the code. Somehow the clusters disappear but I don't know why.
Does anyone see my mistake?



num_cluster = 11for example, the code does not work anymore.