5

I have a dataset containing 61 rows(users) and 26 columns, on which I apply clustering with k-means and others algorithms. first applied KMeans on the dataset after normalizing it. As a prior task I run k-means on this data after normalizing it and identified 10 clusters. In parallel I also tried to visualize these clusters that's why i use PCA to reduce the number of my features.

I have written the following code:

UserID  Communication_dur   Lifestyle_dur   Music & Audio_dur   Others_dur  Personnalisation_dur    Phone_and_SMS_dur   Photography_dur Productivity_dur    Social_Media_dur    System_tools_dur    ... Music & Audio_Freq  Others_Freq Personnalisation_Freq   Phone_and_SMS_Freq  Photography_Freq    Productivity_Freq   Social_Media_Freq   System_tools_Freq   Video players & Editors_Freq    Weather_Freq
1   63  219 9   10  99  42  36  30  76  20  ... 2   1   11  5   3   3   9   1   4   8
2   9   0   0   6   78  0   32  4   15  3   ... 0   2   4   0   2   1   2   1   0   0


from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(3) 
pca.fit(X) 
pca_data = pd.DataFrame(pca.transform(X)) 
print(pca_data.head())

gives the following results:

   0  1  2
0  8 -4  5
1 -2 -2  1
2  1  1 -0
3  2 -1  1
4  3 -1 -3

I want to show a plot (cluster) of my dataset by using a PCA and interpret the results ? I am really new in this space and advice would be greatly appreciated!

Thanks in advance once again.

4
  • You want them 3D or 2D? 2D would be easier, but now you have 3D. Commented Feb 15, 2021 at 8:31
  • I want to 2d ! i can change pca = PCA(2) Commented Feb 15, 2021 at 8:33
  • Does this answer your question? How to plot clusters in python? Commented Feb 15, 2021 at 8:47
  • No I can't find any solution ! Commented Feb 15, 2021 at 8:54

1 Answer 1

8

Using an example dataset:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

df, y = make_blobs(n_samples=70, centers=10,n_features=26,random_state=999,cluster_std=1)

Perform scaling, PCA and put the PC scores into a dataframe:

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(2) 
pca_data = pd.DataFrame(pca.fit_transform(X),columns=['PC1','PC2']) 

Perform kmeans and place the label into a data frame and you can already plot it using seaborn:

kmeans =KMeans(n_clusters=10).fit(X)
pca_data['cluster'] = pd.Categorical(kmeans.labels_)
sns.scatterplot(x="PC1",y="PC2",hue="cluster",data=pca_data)

enter image description here

Or matplotlib:

fig,ax = plt.subplots()
scatter = ax.scatter(pca_data['PC1'], pca_data['PC2'],c=pca_data['cluster'],cmap='Set3',alpha=0.7)
legend1 = ax.legend(*scatter.legend_elements(),
                    loc="upper left", title="")
ax.add_artist(legend1)

enter image description here

Sign up to request clarification or add additional context in comments.

8 Comments

This error was raise: TypeError: data type not understood
which version of seaborn are you on. i am on '0.11.0'. Ok i add a matplotlib code
Thank you for your answer! How to deal with overlapping groups.
hey.. that's another question and I cannot see your screen or your data to comment or help with that. Please post another question with reproducible data to get help
I have also noticed that you have never accepted a single answer. please see stackoverflow.com/help/someone-answers. SO is not a place for you to get other users to code for you!!!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.