2

I am using python sklearn.cluster to do clustering. I have 61 data and each data is of dimension 26. Original data:

UserID  Communication_dur   Lifestyle_dur   Music & Audio_dur   Others_dur  Personnalisation_dur    Phone_and_SMS_dur   Photography_dur Productivity_dur    Social_Media_dur    System_tools_dur    ... Music & Audio_Freq  Others_Freq Personnalisation_Freq   Phone_and_SMS_Freq  Photography_Freq    Productivity_Freq   Social_Media_Freq   System_tools_Freq   Video players & Editors_Freq    Weather_Freq
1   63  219 9   10  99  42  36  30  76  20  ... 2   1   11  5   3   3   9   1   4   8
2   9   0   0   6   78  0   32  4   15  3   ... 0   2   4   0   2   1   2   1   0   0


from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 

Sc = StandardScaler()
X = Sc.fit_transform(df)

I have applied PCA to a dataframe in order to plot clusters based on K-means.

pca = PCA(3) 
pca.fit(X) 
pca_data = pd.DataFrame(pca.transform(X)) 
print(pca_data.head())

Data :

    0  1  2
 0  8 -4  5
 1 -2 -2  1
 2  1  1 -0
 3  2 -1  1
 4  3 -1 -3
kmeans_pca=KMeans(n_clusters=10,init="k-means++",random_state=42)
kmeans_pca.fit (pca_data)

Now I want to plot the resultant clusters how can i do ?

3
  • can you give some example data? (minimal reproducible example) Commented Feb 12, 2021 at 11:33
  • I have modified the question. Commented Feb 12, 2021 at 11:38
  • Plese read the description of the ml tag. Commented Feb 12, 2021 at 13:39

1 Answer 1

4

Haven't tested but can visualize with code like below:

import matplotlib.pyplot as plt
import seaborn as sns

def show_clusters(data, labels):
     palette = sns.color_palette('hls', n_colors=len(set(labels)))
     sns.scatterplot(x=data.iloc[:, 0], y=data.iloc[:, 1], hue=labels, palette=palette)
     plt.axis('off')
     plt.show()

Then call the function by passing PCA data and K-means cluster labels:

show_clusters(pca_data, kmeans_pca.labels_)

Output: clusters visualized

Sign up to request clarification or add additional context in comments.

2 Comments

thank you for yor answer! This error raise: TypeError: '(slice(None, None, None), 0)' is an invalid key
fixed x=data[:, 0] to x=data.iloc[:, 0] and similarly for y, as your data type is not a numpy array but a pandas dataframe, also this is for 2D visualization(so PCA components should be 2 for this case).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.