4

Here's a problem I came across today: I am trying to subtract the first row of a matrix from the (large) entire matrix. As a test, I made all rows equal. Here's a MWE:

import numpy as np
first = np.random.normal(size=10)
reference = np.repeat((first,), 10000, axis=0)
copy_a = np.copy(reference)
copy_a -= copy_a[0]
print np.all(copy_a == 0) # prints False

Oh wow - False! So I tried another thing:

copy_b = np.copy(reference)
copy_b -= reference[0]
np.all(copy_b == 0) # prints True

Examining the new copy_a array, I found that copy_a[0:818] are all zeros, copy_a[820:] are the original values, while copy_a[819] got operated partly.

In [115]: copy_a[819]
Out[115]: 
array([ 0.        ,  0.        ,  0.57704706, -0.22270692, -1.83793342,
        0.58976187, -0.71014837,  1.80517635, -0.98758385, -0.65062774])

Looks like midway during the operation, numpy went back and looked at copy_a[0], found it is all zeros, and hence subtracted zeros from the rest of the array. I find this weird. Is this a bug, or is it an expected numpy result?

1
  • The number of elements for which -= worked correctly in your example is 8192, which is exactly 2^13! Commented Dec 5, 2016 at 6:42

2 Answers 2

3

This issue has actually been reported multiple times to the numpy repository (see below). It is considered a bug, but is very hard to fix without sacrificing performance (copying the input arrays) because correctly detecting if two arrays share memory is difficult.

Therefore, for now, you'd better just make a copy of copy_a[0] as explained in @Torben's answer.

The essence of the issue is that your are modifying the array while iterating. It happens to work until copy_a[819] simply because 8192 (819×10+2) is the size of numpy's assign buffer.


  1. https://github.com/numpy/numpy/issues/6119
  2. https://github.com/numpy/numpy/issues/5241
  3. https://github.com/numpy/numpy/issues/4802
  4. https://github.com/numpy/numpy/issues/2705
  5. https://github.com/numpy/numpy/issues/1683
Sign up to request clarification or add additional context in comments.

Comments

3

The infix operator -= modifies the array inplace, meaning that you are pulling the rug under your own feet. The effect that you see might have to do with internal caching of results (i.e. first "commit" happens after 818 rows).

The solution is to swap out the subtrahend into another array:

copy_a -= copy_a[0].copy()

3 Comments

Nice. Seems to match up with the 2^13 pointed out by @DYZ above. But doesn't this seem like a bad idea? It is the inconsistency that bothers me. I'd expect this to either give all zeros, or give up right after the first row (strict inplace). In the current implementation, it seems like the final answer will depend on the cache size.
@VBB: I think the idea is just that self-referential in-place operations have undefined behavior.
Yes, exactly. Generally, when you work with large arrays you need to have a clear picture of the underlying memory allocations, or else your RAM will be gone quickly. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.