14

I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:

seaborn jointplot

I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module

5
  • 2
    To be clear, your question is how to implement sns.jointplot in vanilla matplotlib? Commented May 3, 2016 at 15:32
  • more or less. my question is how to place another box above a scatter plot, so I can draw an histogram there Commented May 3, 2016 at 15:34
  • 1
    Check out matplotlib.gridspec.GridSpec, specifically the example at the bottom. Without gridspec, you can follow this clear example Commented May 3, 2016 at 15:36
  • 1
    Further, here's a similar example on stackoverflow: stackoverflow.com/questions/20525983/… Commented May 3, 2016 at 15:38
  • 1
    Matplotlib now has an own example on 'Show the marginal distributions of a scatter plot as histograms at the sides of the plot.': matplotlib.org/stable/gallery/axes_grid1/… Commented Oct 18, 2023 at 17:59

3 Answers 3

27

I encountered the same problem today. Additionally I wanted a CDF for the marginals.

enter image description here

Code:

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np

x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))

fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
    
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")

ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')

ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')

plt.show()

Hope it helps the next person searching for scatter-plot with marginal distribution.

Sign up to request clarification or add additional context in comments.

2 Comments

Your pic is beautiful, +1, but the code returns an error: AttributeError: 'Polygon' object has no property 'normed'. Please correct your solution or tell me what I'm doing wrong.
Figured it out: replace normed=True with density=True.
13

Here's an example of how to do it, using gridspec.GridSpec:

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)

fig = plt.figure()

gs = GridSpec(4,4)

ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])

ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")

# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)

# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')

# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()

enter image description here

3 Comments

nice, but how do I remove ticks only from the histograms (without suppressing axes), and how do I add labels selectively?
now my labels appear on the plot [0,0] instead than [1,0]. I want ylabel on plot [0,0], xlabel on plot[1,1], and both labels on plot [1,0]
4

I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :

ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()

after flipping the right histogram

The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.

On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:

before flipping the right histogram

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.