Pandas Scatterplot with colorcoded points

Question

I'd like to make a scatter plot from a Dataframe, where each point is visualized with a unique color in dependence how often that value occured. As example, I have the following dataframe, consisting of lists of two numeric values:

df = pd.DataFrame({'width': image_widths, 'height': image_heights})
df.head(10)
   height  width
0    1093    640
1    1136    639
2    1095    640
3    1136    639
4    1095    640
5    1100    640
6    1136    640
7    1136    639
8    1136    640
9    1031    640

Now, as you see, some value-pairs occure multiple times. For example (1095/640) occures at index 2 and 4. How do I give this dot a color representing "Two occurences". And it would be even better, if the color is picked automatically from a continous spectrum, like in a colorbar plot. Such that already the color-shade gives you an impression of the frequency, rather then by manually looking up what the color represents it.

An alternative to coloring, I also would appreciate, is having the frequency of occurences coded as radius of the dots.

EDIT:

To specify my question, I figured out, that df.groupby(['width','height']).size() gives me the count of all combinations. Now I lack the skill to link this information with the color (or size) of the dots in the plot.

you can assign to each point an Red and Green value based on height and width and a Blue value (or alpha) based on the frequency. Alternatively, you can play with the fill color, stroke color, and alpha for each point. There are tons of options, it's really up to you. — alec_djinn
– alec_djinn, Commented Jul 27, 2017 at 9:54
@ alec_djinn: There are more then two values at width, so I would have to assign a lot of values. It's unfortunate that in the example only those two values appear. Also in future there's a high chance that more points with unseen dimensions are appended. But thanks for the comment, so far, anyways. — muuh
– muuh, Commented Jul 27, 2017 at 9:56

Stop harming Monica · Accepted Answer · 2019-01-28 23:35:54Z

4

So let's make this a true Minimal, Complete, and Verifiable example:

import matplotlib.pyplot as plt
import pandas as pd

image_heights = [1093, 1136, 1095, 1136, 1095, 1100, 1136, 1136, 1136, 1031]
image_widths = [640, 639, 640, 639, 640, 640, 640, 639, 640, 640]
df = pd.DataFrame({'width': image_widths, 'height': image_heights})
print(df)

   width  height
0    640    1093
1    639    1136
2    640    1095
3    639    1136
4    640    1095
5    640    1100
6    640    1136
7    639    1136
8    640    1136
9    640    1031

You want the sizes (counts) along with the widths and heights in a DataFrame:

plot_df = df.groupby(['width','height']).size().reset_index(name='count')
print(plot_df)

   width  height  count
0    639    1136      3
1    640    1031      1
2    640    1093      1
3    640    1095      2
4    640    1100      1
5    640    1136      2

The colors and sizes in a scatterplot are controled by the c and s keywords if you use DataFrame.plot.scatter:

plot_df.plot.scatter(x='height', y='width', s=10 * plot_df['count']**2,
                     c='count', cmap='viridis')

edited Jan 28, 2019 at 23:35

answered Jul 27, 2017 at 11:18

Stop harming Monica

12.7k1 gold badge40 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MaxU - stand with Ukraine Over a year ago

very neat answer ! ++ :)

Stop harming Monica Over a year ago

@tontus It does for me. Maybe ask another question with a minimal reproducible example. I will be happy to take a look if you drop a link here.

Collectives™ on Stack Overflow

Pandas Scatterplot with colorcoded points

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related