How to annotate certain data points on a python scatterplot based on column value

Question

I am almost done with my first real deal python data science project. However, there is one last thing I can't seem to figure out. I have the following code to create a plot for my PCA and K Means clustering algorithm:

y_axis = passers_pca_kmeans['Component 1']
x_axis = passers_pca_kmeans['Component 2']

plt.figure(figsize=(10,8))
sns.scatterplot(x_axis, y_axis, hue=passers_pca_kmeans['Segment'], palette=['g','r','c','m'])
plt.title('Clusters by PCA Components')
plt.grid(zorder=0,alpha=.4)

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name in zip(
    passers_pca_kmeans['Component 2'], passers_pca_kmeans['Component 1'], passers_pca_kmeans.name)]

adjust_text(texts)

plt.show

I finally got the correct code to annotate the points using adjustText, but my plot has too many points to label them all; it looks like a mess with text everywhere.
I would like to annotate the scatterplot based on the value in the column 'Segment'.
- The values in this column are the names of my four clusters 'first', 'second', 'third', 'fourth'.
How do I alter my adjustText code to only annotate points where 'Segment'='first'?
- Would this be an np.where situation?

This answer shows how to add labels near data points individually. In the example they loop over all the points but you don't have to. — Bill
– Bill, Commented Jun 8, 2020 at 4:57
Oh wait, you're using Seaborn. But it may still work I'm not sure. — Bill
– Bill, Commented Jun 8, 2020 at 4:59
Does this answer your question? Adding labels in x y scatter plot with seaborn — Trenton McKinney
– Trenton McKinney, Commented Jun 8, 2020 at 5:04
That's where I'm at right now. However, labeling all data points is too much of a mess. I want to label certain data points based on a column value in my data frame. — bismo
– bismo, Commented Jun 8, 2020 at 5:07
The answers in the duplicate show using the entire dataframe, you just need to Boolean select the points you want and pass that instead of the entire dataframe. — Trenton McKinney
– Trenton McKinney, Commented Jun 8, 2020 at 5:10

Tom · Accepted Answer · 2020-06-08 05:08:31Z

1

You could boolean slice your input into the text call, something like:

mask = (passers_kca_means["Subject"] == "first")
x = passers_kca_means["Component 2"][mask]
y = passers_kca_means["Component 1"][mask]
names = passers_kca_means.name[mask]

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name in zip(x,y,names)]

You could also make an unruly list comprehension by adding an if condition:


x = passers_kca_means["Component 2"]
y = passers_kca_means["Component 1"]
names = passers_kca_means.name
subjects = passers_kca_means["Subject"]

texts = [plt.text(x0,y0,name,ha='right',va='bottom') for x0,y0,name,subject in zip(x,y,names,subjects) if subject == "first"]

I bet there is an answer with np.where as well.

answered Jun 8, 2020 at 5:08

Tom

8,8402 gold badges20 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

bismo Over a year ago

Awesome! This worked. Now, is there a possible way to extend the distance between the text and the points?

Tom Over a year ago

Does calling adjust_text not work here? I'm not familiar with that module.

bismo Over a year ago

It does, I just need to figure out how to get a line that leads the text to its correct point now. I will check the documentation

Collectives™ on Stack Overflow

How to annotate certain data points on a python scatterplot based on column value

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related