5

I'm trying to control the y axis order on a matplotlib scatter plot but the ordering of the x and y axes in the data I have is causing the plot to be displayed incorrectly.

Here's some code to illustrate the problem and one sub-optimal attempt to make a solution.

import pandas as pd
from numpy import random
import matplotlib.pyplot as plt

# make some fake data
axes = ['a', 'b', 'c', 'd']
pairs = pd.DataFrame([(x, y) for x in axes for y in axes], columns=['x', 'y'])
pairs['value'] = random.randint(100, size=16) + 100
# remove the diagonal
pairs_nodiag = pairs[pairs['x'] != pairs['y']]
# zero the values for the diagonal
pairs_diag = pairs.copy()
pairs_diag.loc[pairs_diag['x'] == pairs_diag['y'], 'value'] = 0

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(5, 3))
scatter = ax[0].scatter(x=pairs['x'], y=pairs['y'], s=pairs['value'])
scatter = ax[1].scatter(x=pairs_nodiag['x'], y=pairs_nodiag['y'], s=pairs_nodiag['value'])
scatter = ax[2].scatter(x=pairs_diag['x'], y=pairs_diag['y'], s=pairs_diag['value'])

plt.show()

3 plots

The left most is the raw data. The middle is the plot with the problem; I want the y axis to be the same as the left most plot. The right most plot is what I am after using a sub-optimal workaround. I'm sure there is a way of controlling the ordering on the axes but I'm not expert enough in Python yet to know exactly how to do this.

4
  • I think your workaround is not a workaround but it's the right way to do it. With the boolean indexing you get ('a', 'b') as a first value correctly, but of course this screws up the order. Commented Jul 6, 2020 at 11:57
  • I suppose it's a valid work around but in reality, the data I get isn't complete so it would be tiresome to have to patch it up to ensure the plotting works. Commented Jul 6, 2020 at 13:07
  • Unfortunately I think you have to keep some placeholder for values that you don't want to plot. I would use None instead of 0 Commented Jul 6, 2020 at 14:20
  • @AndrewChisholm: Thanks for the question. Upvoted! Commented Jul 6, 2020 at 16:27

1 Answer 1

3

You need to create your own StringCategoryConverter with your desired mapping (matplotlib by default maps strings to numbers in the sequence the occur).

import matplotlib.category as mcat

# insert the following before scatter = ax[1].scatter(...
units = mcat.UnitData(sorted(pairs_nodiag.y.unique()))
ax[1].yaxis.set_units(units)
ax[1].yaxis.set_major_locator(mcat.StrCategoryLocator(units._mapping))
ax[1].yaxis.set_major_formatter(mcat.StrCategoryFormatter(units._mapping))

enter image description here


UPDATE: The following is the official way to do it without using _mapping:

import matplotlib

# insert the following before scatter = ax[1].scatter(...
scc = matplotlib.category.StrCategoryConverter()
units = scc.default_units(sorted(pairs_nodiag.y.unique()), ax[1].yaxis)
axisinfo = scc.axisinfo(units, ax[1].yaxis)
ax[1].yaxis.set_major_locator(axisinfo.majloc)
ax[1].yaxis.set_major_formatter(axisinfo.majfmt)
Sign up to request clarification or add additional context in comments.

1 Comment

@Stef: Thanks. Learned new today matplotlib.category to sort axis. Upvoted! Deleted my answer as it not relevant anymore.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.