26

How do I create a confidence ellipse in a scatterplot using matplotlib?

The following code works until creating scatter plot. Then, is anyone familiar with putting confidence ellipses over the scatter plot?

import numpy as np
import matplotlib.pyplot as plt
x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]

plt.scatter(x,y)
plt.show()

Following is the reference for Confidence Ellipses from SAS.

http://support.sas.com/documentation/cdl/en/grstatproc/62603/HTML/default/viewer.htm#a003160800.htm

The code in sas is like this:

proc sgscatter data=sashelp.iris(where=(species="Versicolor"));
  title "Versicolor Length and Width";
  compare y=(sepalwidth petalwidth)
          x=(sepallength petallength)
          / reg ellipse=(type=mean) spacing=4;
run;
3
  • 1
    possible duplicate of multidimensional confidence intervals Commented Nov 21, 2013 at 16:35
  • 2
    @ Saullo Castro have you seen the code in sas and do you think that the method implemented in sas and in the link you provided the same? Commented Nov 21, 2013 at 16:42
  • 2
    @tester3 - In the example you linked to, the confidence ellipse shown is for the mean, as opposed to for another sample drawn from the same population. (This is what type=mean is specifying.) My answer that @SaulloCastro linked to shows a confidence ellipse for the entire population (in other words, the area that another sample from the population should fall inside, identical to type=predicted in SAS). Jamie's answer uses this method as well. Commented Nov 21, 2013 at 23:35

5 Answers 5

32

The following code draws a one, two, and three standard deviation sized ellipses:

x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
cov = np.cov(x, y)
lambda_, v = np.linalg.eig(cov)
lambda_ = np.sqrt(lambda_)
from matplotlib.patches import Ellipse
import matplotlib.pyplot as plt
ax = plt.subplot(111, aspect='equal')
for j in xrange(1, 4):
    ell = Ellipse(xy=(np.mean(x), np.mean(y)),
                  width=lambda_[0]*j*2, height=lambda_[1]*j*2,
                  angle=np.rad2deg(np.arccos(v[0, 0])))
    ell.set_facecolor('none')
    ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

@Jamie - +1 Shouldn't the ellipses be twice as wide and high, though? Currently, they're N-sigma wide and high, as opposed to showing the region within N-sigma of the mean.
@JoeKington Yes, I do think you are absolutely right, matplotlib makes it kind of clear they are the width and height, not the semi-width and semi-height... Have edited the code and image. Thanks!
Better check @Ben 's answer below, because this code doesn't compute the angle properly. It appeared about 90 degrees flipped in my case.
Can confirm that this example computes the angle incorrectly; the correct angle is computed as angle= np.rad2deg(np.arctan2(*v[:,0][::-1]))
Yes the proper angle definition should be np.degrees(np.arctan2(*v[:,0][::-1]))
|
29

After giving the accepted answer a go, I found that it doesn't choose the quadrant correctly when calculating theta, as it relies on np.arccos:

oops

Taking a look at the 'possible duplicate' and Joe Kington's solution on github, I watered his code down to this:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse

def eigsorted(cov):
    vals, vecs = np.linalg.eigh(cov)
    order = vals.argsort()[::-1]
    return vals[order], vecs[:,order]

x = [5,7,11,15,16,17,18]
y = [25, 18, 17, 9, 8, 5, 8]

nstd = 2
ax = plt.subplot(111)

cov = np.cov(x, y)
vals, vecs = eigsorted(cov)
theta = np.degrees(np.arctan2(*vecs[:,0][::-1]))
w, h = 2 * nstd * np.sqrt(vals)
ell = Ellipse(xy=(np.mean(x), np.mean(y)),
              width=w, height=h,
              angle=theta, color='black')
ell.set_facecolor('none')
ax.add_artist(ell)
plt.scatter(x, y)
plt.show()

neg slope

5 Comments

Does anyone know how to generalise this to 3D (or possibly n dimensions)?
@Ben Why w, h = 2 * nstd * np.sqrt(vals)? That's 4 * np.sqrt(vals).
why it doesn't work when I change plt.scatter(x, y) to plt.scatter(np.mean(x), np.mean(y)). I want the ellipsoid around the mean
I tried the example and got a blank figure. Does anyone know why?
@tauran: the radii of the ellipse are sqrt(eigenvalue1) and sqrt(eigenvalue2). vals contains both eigenvalues: [eigenvalue1, eigenvalue2]. When drawing an ellipse you need to specify the width and not the radii of the ellipse: for this 2*sqrt(vals). And now there are many ellipses, but if you want to display the confidence area, such that 95% of the data points are inside that ellipse area, we need to display 2 standard deviations. For this, the multiplication with nstd (=2). en.wikipedia.org/wiki/… & cookierobotics.com/007
3

There is no need to compute angles explicitly once you have the eigendecomposition of your covariance matrix: the rotation portion already encodes that information for you for free:

cov = np.cov(x, y)
val, rot = np.linalg.eig(cov)
val = np.sqrt(val)
center = np.mean([x, y], axis=1)[:, None]

t = np.linspace(0, 2.0 * np.pi, 1000)
xy = np.stack((np.cos(t), np.sin(t)), axis=-1)

plt.scatter(x, y)
plt.plot(*(rot @ (val * xy).T + center))

enter image description here

You can expand your ellipse by applying a scale before translation:

plt.plot(*(2 * rot @ (val * xy).T + center))

enter image description here

Comments

1

In addition to the accepted answer: I think the correct angle should be:

angle=np.rad2deg(np.arctan2(*v[:,np.argmax(abs(lambda_))][::-1]))) 

and the corresponding width (larger eigenvalue) and height should be:

width=lambda_[np.argmax(abs(lambda_))]*j*2, height=lambda_[1-np.argmax(abs(lambda_))]*j*2

As we need to find the corresponding eigenvector for the largest eigenvalue. Since "the eigenvalues are not necessarily ordered" according to the specs https://numpy.org/doc/stable/reference/generated/numpy.linalg.eig.html and v[:,i] is the eigenvector corresponding to the eigenvalue lambda_[i]; we should find the correct column of the eigenvector by np.argmax(abs(lambda_)).

Comments

0

We draw a confidence ellipse by calling the following function that calculates its x and y coordinates:

import numpy as np
def get_ellipse(center: np.ndarray, cova: np.ndarray, nsigma: int = 1, npoints: int = 1000) -> np.ndarray:
    """
    Return the coordinates of a covariance ellipse.

    Args:
        center (np.ndarray): The center of the ellipse.
        cova (np.ndarray): The covariance matrix.
        nsigma (int, optional): The number of standard deviations for the ellipse. Defaults to 1.
        npoints (int, optional): The number of points to generate on the ellipse. Defaults to 1000.

    Returns:
        np.ndarray: The coordinates of the ellipse.
    """
    cholesky_l = np.linalg.cholesky(cova)
    t = np.linspace(0, 2 * np.pi, npoints)
    circle = np.column_stack([np.cos(t), np.sin(t)])
    ellipse = nsigma * circle @ cholesky_l.T + center
    return ellipse.T

The function, proposed by one of my teaching assistants, uses the Cholesky decomposition. The confidence ellipses can be plotted with the following code:

import matplotlib.pyplot as plt

x = [5,7,11,15,16,17,18]
y = [8, 5, 8, 9, 17, 18, 25]
mean = np.mean([x, y])
cov = np.cov(x, y, ddof=1)

fig, ax = plt.subplots()
ax.set_xlabel("x")
ax.set_ylabel("y")

ax.scatter(x,y)

x_1sigma, y_1sigma = get_ellipse(mean, cov, nsigma=1)
ax.plot(x_1sigma, y_1sigma, label='$1\sigma$')
x_2sigma, y_2sigma = get_ellipse(mean, cov, nsigma=2)
ax.plot(x_2sigma, y_2sigma, label='$2\sigma$')
x_3sigma, y_3sigma = get_ellipse(mean, cov, nsigma=3)
ax.plot(x_3sigma, y_3sigma, label='$3\sigma$')

ax.legend()

We prefer using ddof=1 in numpy.cov to get an unbiased estimator of the covariance matrix.

Plot of the 1, 2, and 3σ covariance ellipses

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.