2

I have a pandas dataframe including the following columns:

label = ('A' , 'D' , 'K', 'L', 'P')
x = (1 , 4 , 9, 6, 4)
y = (2 , 6 , 5, 8, 9)
plot_id = (1 , 1 , 2, 2, 3)

I want to creat 3 seperate scatter plots - one for each individual plot_id. So the first scatter plot should consists all entries where plot_id == 1 and hence the points (1,2) and (4,6). Each data point should be labelled by label. Hence the first plot should have the labels Aand B.

I understand I can use annotate to label, and I am familiar with for loops. But I have no idea how to combine the two.

I wish I could post better code snippet of what I have done so far - but it's just terrible. Here it is:

for i in range(len(df.plot_id)):
    plt.scatter(df.x[i],df.y[i])
    plt.show()

That's all I got - unfortunately. Any ideas on how to procede?

9
  • what is the link between plot_id and label ? Commented Dec 5, 2016 at 16:42
  • Sorry, I edited the question while commented. I basically am trying to make 3 plots - for each individual plot_id. Commented Dec 5, 2016 at 16:44
  • then label column is useless ... Commented Dec 5, 2016 at 16:45
  • No. I want to label/annotate the data entries (or glyphs if you will) with label. Commented Dec 5, 2016 at 16:48
  • 1
    You need to be very precise about the following: How many plots do you want to create? How many points do you want each plot to have? Where should the labels appear in the plot? Is it correct that you want to have exactly one point per plot? Commented Dec 5, 2016 at 16:48

3 Answers 3

4

updated answer
save separate image files

def annotate(row, ax):
    ax.annotate(row.label, (row.x, row.y),
                xytext=(10, -5), textcoords='offset points')

for pid, grp in df.groupby('plot_id'):
    ax = grp.plot.scatter('x', 'y')
    grp.apply(annotate, ax=ax, axis=1)
    plt.savefig('{}.png'.format(pid))
    plt.close()

1.png
enter image description here

2.png
enter image description here

3.png
enter image description here

old answer
for those who want something like this

def annotate(row, ax):
    ax.annotate(row.label, (row.x, row.y),
                xytext=(10, -5), textcoords='offset points')

fig, axes = plt.subplots(df.plot_id.nunique(), 1)
for i, (pid, grp) in enumerate(df.groupby('plot_id')):
    ax = axes[i]
    grp.plot.scatter('x', 'y', ax=ax)
    grp.apply(annotate, ax=ax, axis=1)
fig.tight_layout()

enter image description here

setup

label = ('A' , 'D' , 'K', 'L', 'P')
x = (1 , 4 , 9, 6, 4)
y = (2 , 6 , 5, 8, 9)
plot_id = (1 , 1 , 2, 2, 3)

df = pd.DataFrame(dict(label=label, x=x, y=y, plot_id=plot_id))
Sign up to request clarification or add additional context in comments.

4 Comments

since there are 54 plot_ids, I don't think subplots might be a good idea. am I wrong ?
I'm sorry, you weren't clear. You said you wanted separate plots.
Yes, indeed. I need 54 individual plots. I will try to be more clear next time!
Great solution! Thank you!
1

Here is a simple way to deal with your problem :

zipped = zip(zip(zip(df.x, df.y), df.plot_id), df.label)
# Result : [(((1, 2), 1), 'A'),
#           (((4, 6), 1), 'D'),
#           (((9, 5), 2), 'K'),
#           (((6, 8), 2), 'L'),
#           (((4, 9), 3), 'P')]

To retrieve the positions, the plot index and the labels, you can loop as below :

for (pos, plot), label in zipped:
    ...
    print pos
    print plot
    print label

Now here is what you can do in your case :

import matplotlib.pyplot as plt

for (pos, plot), label in zipped:
    plt.figure(plot)
    x, y = pos
    plt.scatter(x, y)
    plt.annotate(label, xy=pos)

It will create as much figures as plot_ids and for each figure display the scatter plot of the points with the corresponding plot_ids value. What's more it overlays the label on each point.

6 Comments

Wow! This is great! Is there a way to save the plots on the loop too? I tried to adapt the code and save but unfortunately replace too...
I get a figure for each pos . So given the example brought forward, I get 6 figures. How do I combine them into 3?
@Rachel Are you sure that you get a figure for each pos ? It works perfectly for me ...
Yes. Your print command suggests you use Python 2 whilst I use python 3? Maybe that's why?
can you edit your question with your new piece of code and the variables you use ? I'll check it out
|
0

This is a function to create these plots (based on @piRSquared answer)

def plotter2(data,x,y,grp,lbl):

    def annotate(row, ax):
       ax.annotate(row[lbl], (row[x], row[y]),
            xytext=(3, 0), textcoords='offset points')

   for pid, grp in data.groupby(grp):
       ax = grp.plot.scatter(x, y)
       grp.apply(annotate, ax=ax, axis=1)
       plt.show()
       plt.savefig('{}.png'.format(pid))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.