Why is pandas.Series.std() different from numpy.std()?

Question

This is what I am trying to explain:

>>> a = pd.Series([7, 20, 22, 22])
>>> a.std()
7.2284161474004804
>>> np.std(a)
6.2599920127744575

I have data about many different restaurants. For simplicity I have extracted just one restaurant with four items:

>>> df
    restaurant_id  price
id                      
1           10407      7
3           10407     20
6           10407     22
13          10407     22

For each restaurant, I want to get the standard deviation, however, Pandas returns wrong values.

>>> df.groupby('restaurant_id').std()
                  price
restaurant_id          
10407          7.228416

We can get the correct value with np.std():

>>> np.std(df['price'])
6.2599920127744575

But obviously, this is not a solution when I have more than one restaurant. How do I do this properly?

Just to make sure, I checked that df['price'].mean() == np.mean(df['price']).

There is a related discussion here, but their suggestions do not work either.

pd.Series([7,20,22,22]).std(ddof=0) would be the same number as np.std — behzad.nouri
– behzad.nouri, Commented Sep 6, 2014 at 1:37
OK, resolved. I guess I have to think, which one I want to use. — Sergey Orshanskiy
– Sergey Orshanskiy, Commented Sep 6, 2014 at 1:42
FWIW, I wanted to mention .agg(np.std) as a workaround (which wouldn't be an ideal solution in this case, but the pattern is good to know), but actually, that still produces the Bessel output! I had to do .agg(lambda col: np.std(col)) to get the non-Bessel output. I'm not an expert on this, but I think np.std is a ufunc, which causes special behaviour. — wjandrea
– wjandrea, Commented Oct 17, 2023 at 19:59

wjandrea · Accepted Answer · 2023-10-16 23:22:50Z

34

Pandas std is using Bessel's correction by default -- that is, the standard deviation formula with N-1 instead of N in the denominator. To use N-0:

a.std(ddof=0) == np.std(a)

edited Oct 16, 2023 at 23:22

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Sep 6, 2014 at 1:41

Sergey Orshanskiy

7,0722 gold badges51 silver badges53 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why is pandas.Series.std() different from numpy.std()?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related