12

While searching for some numpy stuff, I came across a question discussing the rounding accuracy of numpy.dot():

Numpy: Difference between dot(a,b) and (a*b).sum()

Since I happen to have two (different) Computers with Haswell-CPUs sitting on my desk, that should provide FMAand everything, I thought I'd test the example given by Ophion in the first answer, and I got a result that somewhat surprised me:

After updating/installing/fixing lapack/blas/atlas/numpy, I get the following on both machines:

>>> a = np.ones(1000, dtype=np.float128)+1e-14
>>> (a*a).sum()
1000.0000000000199999
>>> np.dot(a,a)
1000.0000000000199948

>>> a = np.ones(1000, dtype=np.float64)+1e-14
>>> (a*a).sum()
1000.0000000000198
>>> np.dot(a,a)
1000.0000000000176

So the standard multiplication + sum() is more precise than np.dot(). timeit however confirmed that the .dot() version is faster (but not much) for both float64 and float128.

Can anyone provide an explanation for this?

edit: I accidentally deleted the info on numpy versions: same results for 1.9.0 and 1.9.3 with python 3.4.0 and 3.4.1.

4
  • 1
    Interestingly enough, I only get this discrepancy on NumPy 1.9.2, and not NumPy 1.8.2. Both use the blas + lapack (not atlas). With NumPy 1.8.2, the results are identical with dot and sum, suggesting identical rounding events, on NumPy 1.9.2, multiplication + sum() is more precise. Commented Oct 5, 2015 at 18:25
  • See docs.scipy.org/doc/numpy/… Commented Oct 5, 2015 at 19:17
  • 1
    Also github.com/numpy/numpy/pull/3685, which is where the change in sum was implemented. Commented Oct 5, 2015 at 19:18
  • Can anyone confirm that ATLAS uses FMA? I know the latest MKL does, should be an interesting comparison. Commented Oct 5, 2015 at 23:50

1 Answer 1

2

It looks like they recently added a special Pairwise Summation to ndarray.sum for improved numerical stability.

From PR 3685, this affects:

all add.reduce calls that go over float_add with IS_BINARY_REDUCE true
so this also improves mean/std/var and anything else that uses sum.

See here for code changes.

Sign up to request clarification or add additional context in comments.

2 Comments

A "divide and conquer" algorithm. Yes, that would make sense, thanks.
Does this mean we are actually better of using alternative strategies to compute matrices than integrated functions? That would be ironic, given that numpy purposes is to ease scientific work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.