0

I'd like to make a scatter plot from a Dataframe, where each point is visualized with a unique color in dependence how often that value occured. As example, I have the following dataframe, consisting of lists of two numeric values:

df = pd.DataFrame({'width': image_widths, 'height': image_heights})
df.head(10)
   height  width
0    1093    640
1    1136    639
2    1095    640
3    1136    639
4    1095    640
5    1100    640
6    1136    640
7    1136    639
8    1136    640
9    1031    640

Now, as you see, some value-pairs occure multiple times. For example (1095/640) occures at index 2 and 4. How do I give this dot a color representing "Two occurences". And it would be even better, if the color is picked automatically from a continous spectrum, like in a colorbar plot. Such that already the color-shade gives you an impression of the frequency, rather then by manually looking up what the color represents it.

An alternative to coloring, I also would appreciate, is having the frequency of occurences coded as radius of the dots.

EDIT:

To specify my question, I figured out, that df.groupby(['width','height']).size() gives me the count of all combinations. Now I lack the skill to link this information with the color (or size) of the dots in the plot.

3
  • you can assign to each point an Red and Green value based on height and width and a Blue value (or alpha) based on the frequency. Alternatively, you can play with the fill color, stroke color, and alpha for each point. There are tons of options, it's really up to you. Commented Jul 27, 2017 at 9:54
  • @ alec_djinn: There are more then two values at width, so I would have to assign a lot of values. It's unfortunate that in the example only those two values appear. Also in future there's a high chance that more points with unseen dimensions are appended. But thanks for the comment, so far, anyways. Commented Jul 27, 2017 at 9:56
  • There are 256 values for each R, G, B channel... Commented Jul 27, 2017 at 11:13

1 Answer 1

4

So let's make this a true Minimal, Complete, and Verifiable example:

import matplotlib.pyplot as plt
import pandas as pd

image_heights = [1093, 1136, 1095, 1136, 1095, 1100, 1136, 1136, 1136, 1031]
image_widths = [640, 639, 640, 639, 640, 640, 640, 639, 640, 640]
df = pd.DataFrame({'width': image_widths, 'height': image_heights})
print(df)

   width  height
0    640    1093
1    639    1136
2    640    1095
3    639    1136
4    640    1095
5    640    1100
6    640    1136
7    639    1136
8    640    1136
9    640    1031

You want the sizes (counts) along with the widths and heights in a DataFrame:

plot_df = df.groupby(['width','height']).size().reset_index(name='count')
print(plot_df)

   width  height  count
0    639    1136      3
1    640    1031      1
2    640    1093      1
3    640    1095      2
4    640    1100      1
5    640    1136      2

The colors and sizes in a scatterplot are controled by the c and s keywords if you use DataFrame.plot.scatter:

plot_df.plot.scatter(x='height', y='width', s=10 * plot_df['count']**2,
                     c='count', cmap='viridis')

Scatter plot

Sign up to request clarification or add additional context in comments.

2 Comments

very neat answer ! ++ :)
@tontus It does for me. Maybe ask another question with a minimal reproducible example. I will be happy to take a look if you drop a link here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.