Annotate data points while plotting from Pandas DataFrame

Question

I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.

ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()

What is the best way to annotate all the points for a multi-column DataFrame?

Community · Accepted Answer · 2017-05-23 10:30:49Z

64

Here's a (very) slightly slicker version of Dan Allan's answer:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string

df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)}, 
                  index=list(string.ascii_lowercase[:10]))

Which gives:

          x         y
a  0.541974  0.042185
b  0.036188  0.775425
c  0.950099  0.888305
d  0.739367  0.638368
e  0.739910  0.596037
f  0.974529  0.111819
g  0.640637  0.161805
h  0.554600  0.172221
i  0.718941  0.192932
j  0.447242  0.172469

And then:

fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

Finally, if you're in interactive mode you might need to refresh the plot:

fig.canvas.draw()

Which produces: Boring scatter plot

Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:

from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0, 
        c=range(len(df)), colormap=cmap)

for k, v in df.iterrows():
    ax.annotate(k, v,
                xytext=(10,-5), textcoords='offset points',
                family='sans-serif', fontsize=18, color='darkslategrey')

Which looks a lot nicer: Nice scatter plot

edited May 23, 2017 at 10:30

CommunityBot

11 silver badge

answered Sep 23, 2014 at 16:50

LondonRob

79.8k43 gold badges159 silver badges224 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Little Bobby Tables Over a year ago

Beautiful! (The second plot as you said...)

st19297 Over a year ago

@LondonRob, is there anyway you can tell me how we can annotate only every other nth marker?

LondonRob Over a year ago

@st19297 Create a new question! And include a link to this answer (see the "share" link) so people know where you're starting from!

Howard Lovatt Over a year ago

The problem I have had with this method is that the labels get truncated if they go outside the plot area. Any idea how to fix this?

PatrickT Over a year ago

@HowardLovatt, you can reset the axis limits with xlim=[0,1] and ax.set(xlim=xlim, ylim=ylim) and if you need to calculate the limits dynamically, you can start with df[x].max() and adjust by multiplying by 0.9 or 1.1, say.

|

p_mcp · Accepted Answer · 2019-11-12 11:31:29Z

43

Do you want to use one of the other columns as the text of the annotation? This is something I did recently.

Starting with some example data

In [1]: df
Out[1]: 
           x         y val
 0 -1.015235  0.840049   a
 1 -0.427016  0.880745   b
 2  0.744470 -0.401485   c
 3  1.334952 -0.708141   d
 4  0.127634 -1.335107   e

Plot the points. I plot y against x, in this example.

ax = df.set_index('x')['y'].plot(style='o')

Write a function that loops over x, y, and the value to annotate beside the point.

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.val, ax)

draw()

Annotated Points

edited Nov 12, 2019 at 11:31

p_mcp

2,8118 gold badges40 silver badges76 bronze badges

answered Apr 9, 2013 at 19:59

Dan Allan

35.5k6 gold badges72 silver badges64 bronze badges

Comments

tozCSS · Accepted Answer · 2020-01-21 18:14:30Z

36

Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:

ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);

edited Jan 21, 2020 at 18:14

answered Sep 7, 2016 at 16:05

tozCSS

6,2343 gold badges37 silver badges34 bronze badges

1 Comment

Nelson Auner Over a year ago

For the first line, current pandas would use df.plot('x', 'y', kind='scatter')

Community · Accepted Answer · 2020-06-20 09:12:55Z

I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.

The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.

Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):

ax = df.plot('x', 'y', kind='scatter', s=50 )

def annotate_df(row):  
    ax.annotate(row.name, row.values,
                xytext=(10,-5), 
                textcoords='offset points',
                size=18, 
                color='darkslategrey')
    
_ = df.apply(annotate_df, axis=1)

enter image description here

Edit Notes

I edited my code example recently. Originally it used the same:

fig, ax = plt.subplots()

as the other posts to expose the axes, however this is unnecessary and makes the:

import matplotlib.pyplot as plt

line also unnecessary.

Also note:

If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
Depending on the points, you may have to play with the xytext values to get better placements.

Collectives™ on Stack Overflow

Annotate data points while plotting from Pandas DataFrame

4 Answers 4

6 Comments

Comments

1 Comment

Edit Notes

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

1 Comment

Edit Notes

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related