58

I would like to annotate the data points with their values next to the points on the plot. The examples I found only deal with x and y as vectors. However, I would like to do this for a pandas DataFrame that contains multiple columns.

ax = plt.figure().add_subplot(1, 1, 1)
df.plot(ax = ax)
plt.show()

What is the best way to annotate all the points for a multi-column DataFrame?

4 Answers 4

64

Here's a (very) slightly slicker version of Dan Allan's answer:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import string

df = pd.DataFrame({'x':np.random.rand(10), 'y':np.random.rand(10)}, 
                  index=list(string.ascii_lowercase[:10]))

Which gives:

          x         y
a  0.541974  0.042185
b  0.036188  0.775425
c  0.950099  0.888305
d  0.739367  0.638368
e  0.739910  0.596037
f  0.974529  0.111819
g  0.640637  0.161805
h  0.554600  0.172221
i  0.718941  0.192932
j  0.447242  0.172469

And then:

fig, ax = plt.subplots()
df.plot('x', 'y', kind='scatter', ax=ax)

for k, v in df.iterrows():
    ax.annotate(k, v)

Finally, if you're in interactive mode you might need to refresh the plot:

fig.canvas.draw()

Which produces: Boring scatter plot

Or, since that looks incredibly ugly, you can beautify things a bit pretty easily:

from matplotlib import cm
cmap = cm.get_cmap('Spectral')
df.plot('x', 'y', kind='scatter', ax=ax, s=120, linewidth=0, 
        c=range(len(df)), colormap=cmap)

for k, v in df.iterrows():
    ax.annotate(k, v,
                xytext=(10,-5), textcoords='offset points',
                family='sans-serif', fontsize=18, color='darkslategrey')

Which looks a lot nicer: Nice scatter plot

Sign up to request clarification or add additional context in comments.

6 Comments

Beautiful! (The second plot as you said...)
@LondonRob, is there anyway you can tell me how we can annotate only every other nth marker?
@st19297 Create a new question! And include a link to this answer (see the "share" link) so people know where you're starting from!
The problem I have had with this method is that the labels get truncated if they go outside the plot area. Any idea how to fix this?
@HowardLovatt, you can reset the axis limits with xlim=[0,1] and ax.set(xlim=xlim, ylim=ylim) and if you need to calculate the limits dynamically, you can start with df[x].max() and adjust by multiplying by 0.9 or 1.1, say.
|
43

Do you want to use one of the other columns as the text of the annotation? This is something I did recently.

Starting with some example data

In [1]: df
Out[1]: 
           x         y val
 0 -1.015235  0.840049   a
 1 -0.427016  0.880745   b
 2  0.744470 -0.401485   c
 3  1.334952 -0.708141   d
 4  0.127634 -1.335107   e

Plot the points. I plot y against x, in this example.

ax = df.set_index('x')['y'].plot(style='o')

Write a function that loops over x, y, and the value to annotate beside the point.

def label_point(x, y, val, ax):
    a = pd.concat({'x': x, 'y': y, 'val': val}, axis=1)
    for i, point in a.iterrows():
        ax.text(point['x'], point['y'], str(point['val']))

label_point(df.x, df.y, df.val, ax)

draw()

Annotated Points

Comments

36

Let's assume your df has multiple columns, and three of which are x, y, and lbl. To annotate your (x,y) scatter plot with lbl, simply:

ax = df.plot(kind='scatter',x='x',y='y')
df[['x','y','lbl']].apply(lambda row: ax.text(*row),axis=1);

1 Comment

For the first line, current pandas would use df.plot('x', 'y', kind='scatter')
13

I found the previous answers quite helpful, especially LondonRob's example that improved the layout a bit.

The only thing that bothered me is that I don't like pulling data out of DataFrames to then loop over them. Seems a waste of the DataFrame.

Here was an alternative that avoids the loop using .apply(), and includes the nicer-looking annotations (I thought the color scale was a bit overkill and couldn't get the colorbar to go away):

ax = df.plot('x', 'y', kind='scatter', s=50 )

def annotate_df(row):  
    ax.annotate(row.name, row.values,
                xytext=(10,-5), 
                textcoords='offset points',
                size=18, 
                color='darkslategrey')
    
_ = df.apply(annotate_df, axis=1)

enter image description here

Edit Notes

I edited my code example recently. Originally it used the same:

fig, ax = plt.subplots()

as the other posts to expose the axes, however this is unnecessary and makes the:

import matplotlib.pyplot as plt

line also unnecessary.

Also note:

  • If you are trying to reproduce this example and your plots don't have the points in the same place as any of ours, it may be because the DataFrame was using random values. It probably would have been less confusing if we'd used a fixed data table or a random seed.
  • Depending on the points, you may have to play with the xytext values to get better placements.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.