8

I'm trying to do a simple variance calculation on a set of 3 numbers:

numpy.var([0.82159889, 0.26007962, 0.09818412])

which returns

0.09609366366174843

However, when you calculate the variance it should actually be

0.1441405

Seems like such a simple thing, but I haven't been able to find an answer yet.

2 Answers 2

11

As the documentation explains:

ddof : int, optional
    "Delta Degrees of Freedom": the divisor used in the calculation is
    ``N - ddof``, where ``N`` represents the number of elements. By
    default `ddof` is zero.

And so you have:

>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=0)
0.09609366366174843
>>> numpy.var([0.82159889, 0.26007962, 0.09818412], ddof=1)
0.14414049549262264

Both conventions are common enough that you always need to check which one is being used by whatever package you're using, in any language.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I just figured it out before I came back to check out the answer.
3

np.var by default calculates the population variance.

The Sum of Squared Errors can be calculated as follows:

>>> vals = [0.82159889, 0.26007962, 0.09818412]
>>> mean = sum(vals)/3.0
>>> mean
0.3932875433333333
>>> sum((mean-val)**2 for val in vals)
0.2882809909852453
>>> sse = sum((mean-val)**2 for val in vals)

This is the population variance:

>>> sse/3 
0.09609366366174843
>>> np.var(vals)
0.09609366366174843

This is the sample variance:

>>> sse/(3-1)
0.14414049549262264
>>> np.var(vals, ddof=1)
0.14414049549262264

You can read more about the difference here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.