0

I am currently trying to import some data in a table into python to create a plot of one variable against another. I also want to group each of the data point by two of the other variable in the same table.

One of the variables (the one I want to assign colour to) only has 3 options. The other variable (the one I want to assign the shape to) only has 5. Both of which I can easily group the data into. The issue just comes with plotting, as not all of the groups contain all 3 options of the "colour" variable. I can get the scatter plot to show shapes or colours easily, it is when I combine them that I have an issue.

At the moment I can make it so that the colour is plotted, but there are two sets of shapes for each data point: one that is the correct shape, and the other just a standard point. If I remove what is causing the double points however, the colours are not correct.

This is my current code (with example data), I have given the colour variable letters, but the real data is as simplistic:

import matplotlib.pyplot as plt
import numpy as np

r = np.array([600, 2000, 980, 1770, 920, 1100, 220])
t = np.array([2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56])
spectra_type = np.array(['A', 'A', 'B', 'A', 'C', 'B', 'A'])
spectra_num = np.array([{'A': 0, 'B': 1, 'C': 2}[i] for i in spectra_type])

i = np.array(['Shape1','Shape2','Shape3','Shape4','Shape5','Shape2','Shape4'])
shape1 = np.where(i=='Shape1')[0]
shape2 = np.where(i=='Shape2')[0]
shape3 = np.where(i=='Shape3')[0]
shape4 = np.where(i=='Shape4')[0]
shape5 = np.where(i=='Shape5')[0]


plt.figure('fig 1')
plt.xlabel('x')
plt.ylabel('y')

plt.scatter(t[shape1], r[shape1], c=spectra_num[shape1], marker='D', label='Shape1')
plt.scatter(t[shape2], r[shape2], c=spectra_num[shape2], marker='^', label='Shape2')
plt.scatter(t[shape3], r[shape3], c=spectra_num[shape3], marker='o', label='Shape3')
plt.scatter(t[shape4], r[shape4], c=spectra_num[shape4], marker='s', label='Shape4')
plt.scatter(t[shape5], r[shape5], c=spectra_num[shape5], marker='*', label='Shape5')

first_legend = plt.legend(loc='upper left')
plt.gca().add_artist(first_legend)

scatter = plt.scatter(t, r, c=spectra_num)
plt.legend(handles=scatter.legend_elements()[0], labels=['A', 'B', 'C'], title='Colour')

This gives me the following graph, as you can see the shapes are all there but are overlayed with another "regular" shape.

Example plot from data

Any advice would be much appreciated!

5
  • 1
    because you are plotting twice. One time with shapes and then you plot again the whole t and r just changing colors but not specifying the markers. Why are you plotting twice? you do scatter = plt.scatter(t, r, c=spectra_num) at the end of the code which will print scatter circles of different colors on top of the previous scatter Commented Feb 14 at 13:23
  • @Cincinnatus we discussed that previously in comments in staging ground; tried to make something working using the template code in question, have a look and tell me what you think .. I too stressed out that I guess there are more than one way to do it, but most if not all of them can not be achieved superposing two different set of scatter plots (5 of them being plt.scatter lines and the last one scatter = .. line). Commented Feb 14 at 13:27
  • Discussed previously in comments? No we haven't, that the first comment we exchange Commented Feb 14 at 14:25
  • not with u with the OP ( @gem9911 ) Commented Feb 14 at 14:38
  • @Cincinnatus if I remove the final plot, the colours of the data point change so they don't match up with the colours they should be. I'm trying to find a way to remove this whilst still keeping the colours correct Commented Feb 17 at 10:47

3 Answers 3

1

My take

Everything is pretty standard, except how I compute the handles for the legend, and how I place the legend outside of the Axes using a new (Matplotlib 3.7) feature of Figure.legend() loc keyword argument.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
np.random.seed(20250227)

N = 80 # no. of points
N1 = 3 # no. of different properties in category 1
N2 = 5 # no. of different properties in category 

names1 = 'Ear Eye Nose'.split()
names2 = 'Africa Asia Europe N.America S.America'.split()

d1 = dict(zip(range(N1), ['C'+str(i) for i in range(N1)]))
markers = list(Line2D.filled_markers)
np.random.shuffle(markers)
markers = markers[:N2]
d2 = dict(zip(range(N2), markers))

# fake data
x, y = np.random.rand(2, N)
cat1 = np.random.randint(N1, size=N)
cat2 = np.random.randint(N2, size=N)

fig = plt.figure(figsize=(6, 6), layout='constrained')
for c2 in range(N2):
    marker = markers[c2]
    x2 = [xx for xx, cc in zip(x, cat2) if cc==c2]
    y2 = [yy for yy, cc in zip(y, cat2) if cc==c2]
    colors = [d1[color] for color, cc in zip(cat1, cat2) if cc==c2]    
    plt.scatter(x2, y2,
                color=colors,
                marker=marker,
                )
plt.gca().set_aspect(1)
plt.xlim((-0.05, 1.05));
plt.ylim((-0.05, 1.05));
handles = [Line2D([], [],
                  color=d1[c1],
                  marker=d2[c2],
                  lw=0,
                  label=f'({names1[c1]}, {names2[c2]})'
                 )
           for c2 in range(N2) for c1 in range(N1)]
fig.legend(handles=handles, ncols=5, loc='outside upper center', fontsize='x-small',
           title='Cat1 is mapped to different colors, Cat2 to different shapes')
plt.show()
Sign up to request clarification or add additional context in comments.

Comments

0

only way I found using your code, I had to modify some part of the input.

I guess you could have done the same the other way round:

import matplotlib.pyplot as plt
import numpy as np

r = np.array([600, 2000, 980, 1770, 920, 1100, 220])
t = np.array([2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56])

spectra_type = np.array(['red', 'red', 'blue', 'red', 'yellow', 'blue', 'red'])
spectra_num = np.array([{'red': 0, 'blue': 1, 'yellow': 2}[i] for i in spectra_type])

print(spectra_num)

i = np.array(['Shape1','Shape2','Shape3','Shape4','Shape5','Shape2','Shape4'])
shape1 = np.where(i=='Shape1')[0]
shape2 = np.where(i=='Shape2')[0]
shape3 = np.where(i=='Shape3')[0]
shape4 = np.where(i=='Shape4')[0]
shape5 = np.where(i=='Shape5')[0]

print(shape1, type(shape1))

print(spectra_num[shape1])
print(spectra_num[shape2])
print(spectra_num[shape3])
print(spectra_num[shape4])
print(spectra_num[shape5])


plt.figure('fig 1')
plt.xlabel('x')
plt.ylabel('y')

plt.scatter(t[shape1], r[shape1], c=spectra_type[shape1], marker='D', label='Shape1')
plt.scatter(t[shape2], r[shape2], c=spectra_type[shape2], marker='^', label='Shape2')
plt.scatter(t[shape3], r[shape3], c=spectra_type[shape3], marker='o', label='Shape3')
plt.scatter(t[shape4], r[shape4], c=spectra_type[shape4], marker='s', label='Shape4')
plt.scatter(t[shape5], r[shape5], c=spectra_type[shape5], marker='*', label='Shape5')

first_legend = plt.legend(loc='upper center')
first_legend.legend_handles[0].set_facecolor('black')
first_legend.legend_handles[1].set_facecolor('black')
first_legend.legend_handles[2].set_facecolor('black')
first_legend.legend_handles[3].set_facecolor('black')
first_legend.legend_handles[4].set_facecolor('black')


plt.gca().add_artist(first_legend)

red= plt.Circle((0, 0), 0.1, color='red')
blue= plt.Circle((0, 0), 0.1, color='blue')
yellow= plt.Circle((0, 0), 0.1, color='yellow')

plt.legend(handles= [red, blue, yellow ], labels=['red', 'blue', 'yellow'], title='Colour')


# scatter = plt.scatter(t, r, c=spectra_num)
# plt.legend(handles=scatter.legend_elements()[0], labels=['A', 'B', 'C'], title='Colour')

output:

enter image description here

I guess there is more than one way to do it, but most if not all of them can not be achieved superposing two different set of scatter plots (5 of them being plt.scatter lines and the last one scatter = .. line).

Maybe someone more knowledgeable will step in

Comments

0

I would recommend that you use a package like seaborn, in particular, the scatterplot function, which will simplify things for you a lot. By putting the data into a dictionary, your example can be reduced to:

import seaborn as sns

shape_markers = {
    "Shape1": "D",
    "Shape2": "^",
    "Shape3": "o",
    "Shape4": "s",
    "Shape5": "*",
}

colours = {
    "A": "C0",
    "B": "C1",
    "C": "C2",
}

data = {
    "r": [600, 2000, 980, 1770, 920, 1100, 220],
    "t": [2.7, 12.67, 10.54, 1.3, 16.1, 0.92, 13.56],
    "spectra": ["A", "A", "B", "A", "C", "B", "A"],
    "shape": ["Shape1", "Shape2", "Shape3", "Shape4", "Shape5", "Shape2", "Shape4"],
}

ax = sns.scatterplot(
    data,
    x="t",
    y="r",
    hue="spectra",
    palette=colours,
    style="shape",
    markers=shape_markers,
)

ax.figure.show()

enter image description here

1 Comment

This works well to solve the initial problem, thanks! Would you have any suggestions for separating out shapes further? Some of my data has two "shapes" in one entry, e.g. '"Shape1, Shape3"' as one entry. I was previously filtering these out by finding the comma, which they all have (not included in my code above), but is there a way to do this using your method? Ideally combining them into one "combined" shape group but keeping them individually would be fine too

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.