0

I'm converting a Matlab script to Python and I am getting different results in the 10**-4 order.

In matlab:

f_mean=f_mean+nanmean(f); 
f = f - nanmean(f);

f_t  = gradient(f);
f_tt = gradient(f_t);

if n_loop==1
  theta = atan2( sum(f.*f_tt), sum(f.^2) );
end

theta = -2.2011167e+03

In Python:

f_mean = f_mean + np.nanmean(vel)

vel = vel - np.nanmean(vel)

firstDerivative = np.gradient(vel)
secondDerivative = np.gradient(firstDerivative)


if numberLoop == 1:
    theta = np.arctan2(np.sum(vel * secondDerivative),
                       np.sum([vel**2]))

Although first and secondDerivative give the same results in Python and Matlab, f_mean is slightly different: -0.0066412 (Matlab) and -0.0066414 (Python); and so theta: -0.4126186 (M) and -0.4124718 (P). It is a small difference, but in the end leads to different results in my scripts.

I know some people asked about this difference, but always regarding std, which I get, but not regarding mean values. I wonder why it is.

6
  • 2
    It would help if you could provide a minimal, complete and verifiable example. You've shown the code, which is great, but you haven't provided the input array f_mean, so we can't run the code to reproduce the result. Commented Dec 7, 2017 at 1:25
  • In the Python code, is f_mean initially a numpy array? If so, what is f_mean.dtype? Commented Dec 7, 2017 at 1:25
  • thing is the array is huge, over one million data. but vel would start with [-0.042 -0.028 -0.038 -0.013 -0.026 -0.031 -0.0560 ...] Commented Dec 7, 2017 at 1:26
  • 1
    According to wikipedia numpy uses pairwise summation which on a huge array such as yours should have better accuracy than the naive method. Don't know what matlab does, though. Commented Dec 7, 2017 at 1:51
  • 1
    In Python, you could try np.nanmean(vel, dtype=np.longdouble), but that won't make a difference if you are on Windows, where longdouble is the same as double. Commented Dec 7, 2017 at 2:01

1 Answer 1

1

One possible source of the initial difference you describe (between means) could be numpy's use of pairwise summation which on large arrays will typically be appreciably more accurate than the naive method:

a = np.random.uniform(-1, 1, (10**6,))
a = np.r_[-a, a]
# so the sum should be zero

a.sum()
# 7.815970093361102e-14

# use cumsum to get naive summation:
a.cumsum()[-1]
# -1.3716805469243809e-11

Edit (thanks @sascha): for the last word and as a "provably exact" reference you could use math.fsum:

import math
math.fsum(a)
# 0.0

Don't have matlab, so can't check what they are doing.

Sign up to request clarification or add additional context in comments.

2 Comments

And just for fun: math.fsum(a.tolist()) (even more precise, but probably much slower).
@sascha Neat! It is rather slow, but funnily enough seems faster than Python sum...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.