DBSCAN in scikit-learn of Python: Trouble understanding result of DBSCAN

Question

This example is from Data Science for dummies:

digits = load_digits()
X = digits.data
ground_truth = digits.target

pca = PCA(n_components=40)
Cx = pca.fit_transform(scale(X))

DB = DBSCAN(eps=4.35, min_samples=25, random_state=1)
DB.fit(Cx)



for k,cl in enumerate(np.unique(DB.labels_)):
    if cl >= 0:
        example = np.min(np.where(DB.labels_==cl)) # question 1
        plt.subplot(2, 3, k)
            plt.imshow(digits.images[example],cmap='binary', # question 2
            interpolation='none') 
        plt.title('cl '+str(cl))
plt.show()

My questions are:

np.where(DB.labels_==cl) I don't understand on which array we apply np.where when I print np.where(DB.labels_==cl) it is looks like it applied to DB.core_sample_indices_. But I don't understand why. As I could understand from documentation for np.where, np.where(DB.labels_==cl) should be applied on DB.labels_.
How come np.min(np.where(DB.labels_==cl)) gives me indice which in digits.images plots me correct image. Thank you.

Thomas Moreau · Accepted Answer · 2016-03-10 18:51:14Z

The output of the operation DB.labels_ == cl is an array of Boolean such that (DB.labels_ == cl)[i] is True if DB.labels_[i] == cl.

Thus np.where is applied to the array DB.labels_ == cl. And its ouput, if used on a single array, are the nonzero elements of this array, i.e. the element which are True.

The operation np.where(DB.labels_ == cl) returns the indices of the elements of DB.labels_ that are equals to cl. These are the element of the data used in fit that have been labeled by DB as part of the cluster cl.
In this case np.min returns the smallest indice in the previous array. This means that it will retrieve the first element of your set that have been classified as being part of the cluster cl. By looping thru all the clusters, you retrieve a set of examples of the images that constitute in your clusters.

This indices correspond to the one in data.image as DB.labels_ contains the labels of each of the point in the dataset that you feeded to DB.fit. This dataset as the same order as data.images.

Collectives™ on Stack Overflow

DBSCAN in scikit-learn of Python: Trouble understanding result of DBSCAN

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related