Visualizing clusters result using PCA (Python)

Question

I have a dataset containing 61 rows(users) and 26 columns, on which I apply clustering with k-means and others algorithms. first applied KMeans on the dataset after normalizing it. As a prior task I run k-means on this data after normalizing it and identified 10 clusters. In parallel I also tried to visualize these clusters that's why i use PCA to reduce the number of my features.

I have written the following code:

UserID  Communication_dur   Lifestyle_dur   Music & Audio_dur   Others_dur  Personnalisation_dur    Phone_and_SMS_dur   Photography_dur Productivity_dur    Social_Media_dur    System_tools_dur    ... Music & Audio_Freq  Others_Freq Personnalisation_Freq   Phone_and_SMS_Freq  Photography_Freq    Productivity_Freq   Social_Media_Freq   System_tools_Freq   Video players & Editors_Freq    Weather_Freq
1   63  219 9   10  99  42  36  30  76  20  ... 2   1   11  5   3   3   9   1   4   8
2   9   0   0   6   78  0   32  4   15  3   ... 0   2   4   0   2   1   2   1   0   0


from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(3) 
pca.fit(X) 
pca_data = pd.DataFrame(pca.transform(X)) 
print(pca_data.head())

gives the following results:

I want to show a plot (cluster) of my dataset by using a PCA and interpret the results ? I am really new in this space and advice would be greatly appreciated!

Thanks in advance once again.

You want them 3D or 2D? 2D would be easier, but now you have 3D. — Frightera
– Frightera, Commented Feb 15, 2021 at 8:31
Does this answer your question? How to plot clusters in python? — user11989081
– user11989081, Commented Feb 15, 2021 at 8:47

StupidWolf · Accepted Answer · 2021-02-15 10:22:34Z

8

Using an example dataset:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA 
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

df, y = make_blobs(n_samples=70, centers=10,n_features=26,random_state=999,cluster_std=1)

Perform scaling, PCA and put the PC scores into a dataframe:

Sc = StandardScaler()
X = Sc.fit_transform(df)
pca = PCA(2) 
pca_data = pd.DataFrame(pca.fit_transform(X),columns=['PC1','PC2'])

Perform kmeans and place the label into a data frame and you can already plot it using seaborn:

kmeans =KMeans(n_clusters=10).fit(X)
pca_data['cluster'] = pd.Categorical(kmeans.labels_)
sns.scatterplot(x="PC1",y="PC2",hue="cluster",data=pca_data)

Or matplotlib:

fig,ax = plt.subplots()
scatter = ax.scatter(pca_data['PC1'], pca_data['PC2'],c=pca_data['cluster'],cmap='Set3',alpha=0.7)
legend1 = ax.legend(*scatter.legend_elements(),
                    loc="upper left", title="")
ax.add_artist(legend1)

edited Feb 15, 2021 at 10:22

answered Feb 15, 2021 at 9:54

StupidWolf

47.1k17 gold badges50 silver badges81 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

ab20225 Over a year ago

This error was raise: TypeError: data type not understood

StupidWolf Over a year ago

which version of seaborn are you on. i am on '0.11.0'. Ok i add a matplotlib code

ab20225 Over a year ago

Thank you for your answer! How to deal with overlapping groups.

StupidWolf Over a year ago

hey.. that's another question and I cannot see your screen or your data to comment or help with that. Please post another question with reproducible data to get help

StupidWolf Over a year ago

I have also noticed that you have never accepted a single answer. please see stackoverflow.com/help/someone-answers. SO is not a place for you to get other users to code for you!!!

|

Collectives™ on Stack Overflow

Visualizing clusters result using PCA (Python)

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related