1

I am almost done with my first real deal python data science project. However, there is one last thing I can't seem to figure out. I have the following code to create a plot for my PCA and K Means clustering algorithm:

y_axis = passers_pca_kmeans['Component 1']
x_axis = passers_pca_kmeans['Component 2']

plt.figure(figsize=(10,8))
sns.scatterplot(x_axis, y_axis, hue=passers_pca_kmeans['Segment'], palette=['g','r','c','m'])
plt.title('Clusters by PCA Components')
plt.grid(zorder=0,alpha=.4)

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name in zip(
    passers_pca_kmeans['Component 2'], passers_pca_kmeans['Component 1'], passers_pca_kmeans.name)]

adjust_text(texts)

plt.show
  • I finally got the correct code to annotate the points using adjustText, but my plot has too many points to label them all; it looks like a mess with text everywhere.
  • I would like to annotate the scatterplot based on the value in the column 'Segment'.
    • The values in this column are the names of my four clusters 'first', 'second', 'third', 'fourth'.
  • How do I alter my adjustText code to only annotate points where 'Segment'='first'?
    • Would this be an np.where situation?
5
  • This answer shows how to add labels near data points individually. In the example they loop over all the points but you don't have to. Commented Jun 8, 2020 at 4:57
  • Oh wait, you're using Seaborn. But it may still work I'm not sure. Commented Jun 8, 2020 at 4:59
  • Does this answer your question? Adding labels in x y scatter plot with seaborn Commented Jun 8, 2020 at 5:04
  • That's where I'm at right now. However, labeling all data points is too much of a mess. I want to label certain data points based on a column value in my data frame. Commented Jun 8, 2020 at 5:07
  • The answers in the duplicate show using the entire dataframe, you just need to Boolean select the points you want and pass that instead of the entire dataframe. Commented Jun 8, 2020 at 5:10

1 Answer 1

1

You could boolean slice your input into the text call, something like:

mask = (passers_kca_means["Subject"] == "first")
x = passers_kca_means["Component 2"][mask]
y = passers_kca_means["Component 1"][mask]
names = passers_kca_means.name[mask]

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name in zip(x,y,names)]

You could also make an unruly list comprehension by adding an if condition:


x = passers_kca_means["Component 2"]
y = passers_kca_means["Component 1"]
names = passers_kca_means.name
subjects = passers_kca_means["Subject"]

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name,subject in zip(x,y,names,subjects) if subject == "first"]

I bet there is an answer with np.where as well.

Sign up to request clarification or add additional context in comments.

3 Comments

Awesome! This worked. Now, is there a possible way to extend the distance between the text and the points?
Does calling adjust_text not work here? I'm not familiar with that module.
It does, I just need to figure out how to get a line that leads the text to its correct point now. I will check the documentation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.