0

I am trying to implement k-mean clustering algorithm for small project. I came upon this article which suggest that

K-Means is much faster if you write the update functions using operations on numpy arrays, instead of manually looping over the arrays and updating the values yourself.

I am exactly using iteration over each element of array to update it. For each element in dataset z, I am assigning the cluster array from nearest centroid via iteration through each element.

    for i in range(z):
        clstr[i] = closest_center(data[i], cen)

and my update function is

def closest_center(x, clist):
    dlist = [fabs(x - i) for i in clist]
    return clist[dlist.index(min(dlist))]

Since I am using grayscale image, I am using absolute value to calculate the Euclidean distance.

I noticed that opencv has this algorithm too. It takes less than 2s to execute the algorithm while mine takes more than 70s. May I know what the article is suggesting?

My images are imported as gray scale and is represented as 2d numpy array. I further converted into 1d array because it's easier to process 1d array.

6
  • Why do you want to implement this yourself? scipy already has a k-means clustering algorithm for you. Commented Apr 24, 2016 at 20:41
  • @AkshatMahajan As a small project in image processing. I already have access to it via OpenCV. Still I have to it without using inbuilt function. Commented Apr 24, 2016 at 20:43
  • you should share more of your code, in order to have a better idea. Also shouldn't you use the euclidean distance ? Commented Apr 24, 2016 at 20:45
  • @Romain for greyscale image, there is only one element. So, Euclidean distance is same as absolute value as in real number line. Commented Apr 24, 2016 at 20:50
  • @SantoshLinkha A fairly big part of it is probably implementation in C vs python. Commented Apr 24, 2016 at 21:00

1 Answer 1

1

The list comprehension is likely to slow down execution. I would suggest to vectorize the function closest_center. This is straightforward for 1-dimensional arrays:

import numpy as np

def closest_center(x, clist):
    return clist[np.argmin(np.abs(x - clist))]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.