1

I am working on a kmeans clustering. I have write down a code with the help of some available references on the web but when I run this code it fires an error:

    Traceback (most recent call last):
  File "clustering.py", line 16, in <module>
    ds = df[np.where(labels==i)]
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__
    return self._getitem_column(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column
    return self._get_item_cache(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1050, in _get_item_cache
    res = cache.get(item)
TypeError: unhashable type: 'numpy.ndarray'

Though, many previous threads are available with the same error but there is no single solution available that can handle this error in my program. How can I debug this error ?

Code which i used:

from sklearn import cluster
import pandas as pd

df = [
[0.57,-0.845,-0.8277,-0.1585,-1.616],
[0.47,-0.14,-0.5277,-0.158,-1.716],
[0.17,-0.845,-0.5277,-0.158,-1.616],
[0.27,-0.14,-0.8277,-0.158,-1.716]]

df = pd.DataFrame(df,columns= ["a","b","c","d", "e"])

# df = pd.read_csv("cleaned_remove_cor.csv")

k = 3
kmeans = cluster.KMeans(n_clusters=k)
kmeans.fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
from matplotlib import pyplot
import numpy as np

for i in range(k):
    # select only data observations with cluster label == i
    ds = df[np.where(labels==i)]
    # plot the data observations
    pyplot.plot(ds[:,0],ds[:,1],'o')
    # plot the centroids
    lines = pyplot.plot(centroids[i,0],centroids[i,1],'kx')
    # make the centroid x's bigger
    pyplot.setp(lines,ms=15.0)
    pyplot.setp(lines,mew=2.0)
pyplot.show()

The shape of my DataFrame is (8127x600)

6
  • always give the full error traceback, not just the last line. Commented Mar 9, 2016 at 7:52
  • @cel updated error log Commented Mar 9, 2016 at 7:54
  • 4
    ds = df[np.where(labels==i)] this seems very strange. Did you mean: ds = df[labels==i]? Commented Mar 9, 2016 at 8:00
  • 2
    trim down your dataset and modify this to be a self-contained and runnable example. Commented Mar 9, 2016 at 8:05
  • 1
    @DavidG i have update my Question with simple example i have run this code for the above data frame and its throwing the same error. Commented Mar 9, 2016 at 9:44

1 Answer 1

3

I tried and this works for me, conversion of pandas df to numpy matrix:

df = df.as_matrix(columns= ["a","b","c","d", "e"])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.