Precision Matlab and Python (numpy)

Question

I'm converting a Matlab script to Python and I am getting different results in the 10**-4 order.

In matlab:

f_mean=f_mean+nanmean(f); 
f = f - nanmean(f);

f_t  = gradient(f);
f_tt = gradient(f_t);

if n_loop==1
  theta = atan2( sum(f.*f_tt), sum(f.^2) );
end

theta = -2.2011167e+03

In Python:

f_mean = f_mean + np.nanmean(vel)

vel = vel - np.nanmean(vel)

firstDerivative = np.gradient(vel)
secondDerivative = np.gradient(firstDerivative)


if numberLoop == 1:
    theta = np.arctan2(np.sum(vel * secondDerivative),
                       np.sum([vel**2]))

Although first and secondDerivative give the same results in Python and Matlab, f_mean is slightly different: -0.0066412 (Matlab) and -0.0066414 (Python); and so theta: -0.4126186 (M) and -0.4124718 (P). It is a small difference, but in the end leads to different results in my scripts.

I know some people asked about this difference, but always regarding std, which I get, but not regarding mean values. I wonder why it is.

It would help if you could provide a minimal, complete and verifiable example. You've shown the code, which is great, but you haven't provided the input array f_mean, so we can't run the code to reproduce the result. — Warren Weckesser
– Warren Weckesser, Commented Dec 7, 2017 at 1:25
In the Python code, is f_mean initially a numpy array? If so, what is f_mean.dtype? — Warren Weckesser
– Warren Weckesser, Commented Dec 7, 2017 at 1:25
thing is the array is huge, over one million data. but vel would start with [-0.042 -0.028 -0.038 -0.013 -0.026 -0.031 -0.0560 ...] — Nery Neto
– Nery Neto, Commented Dec 7, 2017 at 1:26
According to wikipedia numpy uses pairwise summation which on a huge array such as yours should have better accuracy than the naive method. Don't know what matlab does, though. — Paul Panzer
– Paul Panzer, Commented Dec 7, 2017 at 1:51
In Python, you could try np.nanmean(vel, dtype=np.longdouble), but that won't make a difference if you are on Windows, where longdouble is the same as double. — Warren Weckesser
– Warren Weckesser, Commented Dec 7, 2017 at 2:01

Paul Panzer · Accepted Answer · 2017-12-07 11:09:49Z

1

One possible source of the initial difference you describe (between means) could be numpy's use of pairwise summation which on large arrays will typically be appreciably more accurate than the naive method:

a = np.random.uniform(-1, 1, (10**6,))
a = np.r_[-a, a]
# so the sum should be zero

a.sum()
# 7.815970093361102e-14

# use cumsum to get naive summation:
a.cumsum()[-1]
# -1.3716805469243809e-11

Edit (thanks @sascha): for the last word and as a "provably exact" reference you could use math.fsum:

import math
math.fsum(a)
# 0.0

Don't have matlab, so can't check what they are doing.

edited Dec 7, 2017 at 11:09

answered Dec 7, 2017 at 2:10

Paul Panzer

53.3k3 gold badges60 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sascha Over a year ago

And just for fun: math.fsum(a.tolist()) (even more precise, but probably much slower).

Paul Panzer Over a year ago

@sascha Neat! It is rather slow, but funnily enough seems faster than Python sum...

Collectives™ on Stack Overflow

Precision Matlab and Python (numpy)

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related