61

The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is high in my opinion).

Example

import numpy as np
import pandas as pd
from StringIO import StringIO

a='''0.057411
0.024367
 0.021247
-0.001809
-0.010874
-0.035845
0.001663
0.043282
0.004433
-0.007242
0.029294
0.023699
0.049654
0.034422
-0.005380'''


df = pd.read_csv(StringIO(a.strip()), delim_whitespace=True, header=None)

df.std()==np.std(df) # False
df.std() # 0.025801
np.std(df) # 0.024926

(0.024926 - 0.025801) / 0.024926 # 3.5% relative difference

I use these versions:

pandas '0.14.0'
numpy '1.8.1'

2 Answers 2

89

In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator (N-1 in the denominator), whereas Numpy by default does not.

To make them behave the same, pass ddof=1 to numpy.std().

For further discussion, see

Sign up to request clarification or add additional context in comments.

3 Comments

yes, in fact df.std()==np.std(df, ddof=1) is True! Therefore the question now becomes which estimator is better :-), just kidding...
For the record, people considering using df.std() and np.std(ddof=1) interchangeably should also be aware of another difference between the two: np.std returns nan if there are any missing values whereas df.std returns the standard deviation of the non-missing values. If you want to ignore nans use np.nanstd().
This implies that df.std != df.values.std() which I did not expect at all. This seems pretty confusing.
10

For pandas to performed the same as numpy, you can pass in the ddof=0 parameter, so df.std(ddof=0).

This short video explains quite well why n-1 might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.