We don't know a priori how many "hops" it takes to find a cluster center. So we'll have to do some iteration and check if we've landed on a cluster center:
import numpy as np
a = np.array([6, 8, 1, -1, 0, 3, -2, 4, -3, 10, 5])
for i in a:
mask = a>=0
# We can stop when all the values in `a` are negative
if not mask.any(): break
# perform a hop
a[mask] = a[a[mask]]
print(a)
yields
[-2 -3 -3 -1 -2 -1 -2 -2 -3 -1 -1]
To see what's going on, it might be clearer to look at a simpler value of a:
a = np.arange(-1, 10)
print(a)
for i in a:
mask = a>=0
# We can stop when all the values in `a` are negative
if not mask.any(): break
# perform a hop
a[mask] = a[a[mask]]
print(a)
prints
[-1 0 1 2 3 4 5 6 7 8 9]
[-1 -1 0 1 2 3 4 5 6 7 8]
[-1 -1 -1 -1 0 1 2 3 4 5 6]
[-1 -1 -1 -1 -1 -1 -1 -1 0 1 2]
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
Here we can clearly see that the values at each index (i.e. a column in the
output) are decreasing. You might have expected the values to decrease by 1, but
in fact they decrease by more than one in subsequent iterations because of the
effect of previous hops.
More generally, let G be a sequence of indices that hop to the same cluster
value. An invariant of the loop is that values in G map to values in G.
Also, at each index, with each iteration the distance (i.e. number of hops) to a
cluster value decreases (or is zero). These two facts together imply by the Banach fixed point theorem that the
algorithm converges to a fixed point.
Also with each iteration the number of locations filled with cluster values
grows exponentially. If before an iteration there are n locations filled with cluster values, then after a hop there will be 2n locations filled with cluster values. So the convergence is exponential.
Following the OP's lead, I changed the while True loop to a for-loop. The fixed point will be found in fewer than len(a) steps, but nevertheless for i in a suffices.
Note that the code above assumes that every index converges to a cluster value. If that is not true, then the original while True loop could loop forever. The for i in a will end, but maybe not at all cluster values. A simple example which illustrates the problem is a = np.array([1,0]).
Interestingly, the "simple example" np.arange(-1, 10) is also the worst-case
scenario in terms of maximum iterations required for an array of length 11.
Since the number of cluster values grows as 2**n, the loop requires at most on the order of log2(len(a)) iterations. Therefore, we can write a function will either return the cluster values or raise a ValueError when a contains infinite cycles:
def converge(a):
for i in range(int(np.log2(len(a)))+1):
mask = a>=0
# We can stop when all the values in `a` are negative
if not mask.any(): break
# perform a hop
a[mask] = a[a[mask]]
else:
# for-loop completed without reaching break
mask = a>=0
if mask.any():
raise ValueError('{!r} contains infinite cycles'.format(a))
return a
a[9].