Creating a Confidence Ellipse in a scatterplot using matplotlib

Question

How do I create a confidence ellipse in a scatterplot using matplotlib?

The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?

import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]

plt.scatter(x,y)
plt.show()

Following is the reference for Confidence Ellipses from SAS.

http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm

The code in sas is like this:

proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
  title "Versicolor Length and Width";
  compare y=(sepalwidth petalwidth)
          x=(sepallength petallength)
          / reg ellipse=(type=mean) spacing=4;
run;

@ Saullo Castro have you seen the code in sas and do you think that the method implemented in sas and in the link you provided the same? — 2964502
– 2964502, Commented Nov 21, 2013 at 16:42
@tester3 - In the example you linked to, the confidence ellipse shown is for the mean, as opposed to for another sample drawn from the same population. (This is what type=mean is specifying.) My answer that @SaulloCastro linked to shows a confidence ellipse for the entire population (in other words, the area that another sample from the population should fall inside, identical to type=predicted in SAS). Jamie's answer uses this method as well. — Joe Kington
– Joe Kington, Commented Nov 21, 2013 at 23:35

Jaime · Accepted Answer · 2013-11-21 19:55:00Z

32

The following code draws a one, two, and three standard deviation sized ellipses:

x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
    ell = Ellipse(xy=(np.mean(x), np.mean(y)),
                  width=lambda_[0]*j*2, height=lambda_[1]*j*2,
                  angle=np.rad2deg(np.arccos(v[0, 0])))
    ell.set_facecolor('none')
    ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

enter image description here

edited Nov 21, 2013 at 19:55

answered Nov 21, 2013 at 17:04

Jaime

67.7k19 gold badges128 silver badges164 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Joe Kington Over a year ago

@Jamie - +1 Shouldn't the ellipses be twice as wide and high, though? Currently, they're N-sigma wide and high, as opposed to showing the region within N-sigma of the mean.

Jaime Over a year ago

@JoeKington Yes, I do think you are absolutely right, matplotlib makes it kind of clear they are the width and height, not the semi-width and semi-height... Have edited the code and image. Thanks!

grasshopper Over a year ago

Better check @Ben 's answer below, because this code doesn't compute the angle properly. It appeared about 90 degrees flipped in my case.

nathan lachenmyer Over a year ago

Can confirm that this example computes the angle incorrectly; the correct angle is computed as angle= np.rad2deg(np.arctan2(*v[:,0][::-1]))

lhoupert Over a year ago

Yes the proper angle definition should be np.degrees(np.arctan2(*v[:,0][::-1]))

|

Community · Accepted Answer · 2017-05-23 12:09:53Z

29

After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:

oops

Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def eigsorted(cov):
    vals, vecs = np.linalg.eigh(cov)
    order = vals.argsort()[::-1]
    return vals[order], vecs[:,order]

x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]

nstd = 2
ax = plt.subplot(111)

cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
              width=w, height=h,
              angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

neg slope

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Jul 29, 2014 at 18:38

Ben

4334 silver badges7 bronze badges

5 Comments

HansSnah Over a year ago

Does anyone know how to generalise this to 3D (or possibly n dimensions)?

tauran Over a year ago

@Ben Why w, h = 2 * nstd * np.sqrt(vals)? That's 4 * np.sqrt(vals).

Farshid Rayhan Over a year ago

why it doesn't work when I change plt.scatter(x, y) to plt.scatter(np.mean(x), np.mean(y)). I want the ellipsoid around the mean

vicemagui Over a year ago

I tried the example and got a blank figure. Does anyone know why?

Jürgen Brauer Over a year ago

@tauran: the radii of the ellipse are sqrt(eigenvalue1) and sqrt(eigenvalue2). vals contains both eigenvalues: [eigenvalue1, eigenvalue2]. When drawing an ellipse you need to specify the width and not the radii of the ellipse: for this 2*sqrt(vals). And now there are many ellipses, but if you want to display the confidence area, such that 95% of the data points are inside that ellipse area, we need to display 2 standard deviations. For this, the multiplication with nstd (=2). en.wikipedia.org/wiki/… & cookierobotics.com/007

Mad Physicist · Accepted Answer · 2023-01-28 05:14:23Z

3

There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:

cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]

t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)

plt.scatter(x, y)
plt.plot(*(rot @ (val * xy).T + center))

You can expand your ellipse by applying a scale before translation:

plt.plot(*(2 * rot @ (val * xy).T + center))

answered Jan 28, 2023 at 5:14

Mad Physicist

116k29 gold badges202 silver badges292 bronze badges

Comments

Kerem · Accepted Answer · 2021-01-25 10:47:06Z

1

In addition to the accepted answer: I think the correct angle should be:

angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1])))

and the corresponding width (larger eigenvalue) and height should be:

width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2

As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).

edited Jan 25, 2021 at 10:47

answered Jan 25, 2021 at 9:00

Kerem

113 bronze badges

Comments

Diego Ravignani · Accepted Answer · 2023-12-28 18:49:01Z

We draw a confidence ellipse by calling the following function that calculates its x and y coordinates:

import numpy as np
def get_ellipse(center: np.ndarray, cova: np.ndarray, nsigma: int = 1, npoints: int = 1000) -> np.ndarray:
    """
    Return the coordinates of a covariance ellipse.

    Args:
        center (np.ndarray): The center of the ellipse.
        cova (np.ndarray): The covariance matrix.
        nsigma (int, optional): The number of standard deviations for the ellipse. Defaults to 1.
        npoints (int, optional): The number of points to generate on the ellipse. Defaults to 1000.

    Returns:
        np.ndarray: The coordinates of the ellipse.
    """
    cholesky_l = np.linalg.cholesky(cova)
    t = np.linspace(0, 2 * np.pi, npoints)
    circle = np.column_stack([np.cos(t), np.sin(t)])
    ellipse = nsigma * circle @ cholesky_l.T + center
    return ellipse.T

The function, proposed by one of my teaching assistants, uses the Cholesky decomposition. The confidence ellipses can be plotted with the following code:

import matplotlib.pyplot as plt

x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
mean = np.mean([x, y])
cov = np.cov(x, y, ddof=1)

fig, ax = plt.subplots()
ax.set_xlabel("x")
ax.set_ylabel("y")

ax.scatter(x,y)

x_1sigma, y_1sigma = get_ellipse(mean, cov, nsigma=1)
ax.plot(x_1sigma, y_1sigma, label='$1\sigma$')
x_2sigma, y_2sigma = get_ellipse(mean, cov, nsigma=2)
ax.plot(x_2sigma, y_2sigma, label='$2\sigma$')
x_3sigma, y_3sigma = get_ellipse(mean, cov, nsigma=3)
ax.plot(x_3sigma, y_3sigma, label='$3\sigma$')

ax.legend()

We prefer using ddof=1 in numpy.cov to get an unbiased estimator of the covariance matrix.

Plot of the 1, 2, and 3σ covariance ellipses

Collectives™ on Stack Overflow

Creating a Confidence Ellipse in a scatterplot using matplotlib

5 Answers 5

6 Comments

5 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

6 Comments

5 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related