Numpy array subtraction: inconsistent values for large arrays

Question

Here's a problem I came across today: I am trying to subtract the first row of a matrix from the (large) entire matrix. As a test, I made all rows equal. Here's a MWE:

import numpy as np
first = np.random.normal(size=10)
reference = np.repeat((first,), 10000, axis=0)
copy_a = np.copy(reference)
copy_a -= copy_a[0]
print np.all(copy_a == 0) # prints False

Oh wow - False! So I tried another thing:

copy_b = np.copy(reference)
copy_b -= reference[0]
np.all(copy_b == 0) # prints True

Examining the new copy_a array, I found that copy_a[0:818] are all zeros, copy_a[820:] are the original values, while copy_a[819] got operated partly.

In [115]: copy_a[819]
Out[115]: 
array([ 0.        ,  0.        ,  0.57704706, -0.22270692, -1.83793342,
        0.58976187, -0.71014837,  1.80517635, -0.98758385, -0.65062774])

Looks like midway during the operation, numpy went back and looked at copy_a[0], found it is all zeros, and hence subtracted zeros from the rest of the array. I find this weird. Is this a bug, or is it an expected numpy result?

The number of elements for which -= worked correctly in your example is 8192, which is exactly 2^13! — DYZ
– DYZ, Commented Dec 5, 2016 at 6:42

Community · Accepted Answer · 2017-05-23 11:45:55Z

3

This issue has actually been reported multiple times to the numpy repository (see below). It is considered a bug, but is very hard to fix without sacrificing performance (copying the input arrays) because correctly detecting if two arrays share memory is difficult.

Therefore, for now, you'd better just make a copy of copy_a[0] as explained in @Torben's answer.

The essence of the issue is that your are modifying the array while iterating. It happens to work until copy_a[819] simply because 8192 (819×10+2) is the size of numpy's assign buffer.

edited May 23, 2017 at 11:45

CommunityBot

11 silver badge

answered Dec 5, 2016 at 6:59

kennytm

526k110 gold badges1.1k silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Torben Klein · Accepted Answer · 2016-12-05 06:42:03Z

3

The infix operator -= modifies the array inplace, meaning that you are pulling the rug under your own feet. The effect that you see might have to do with internal caching of results (i.e. first "commit" happens after 818 rows).

The solution is to swap out the subtrahend into another array:

copy_a -= copy_a[0].copy()

answered Dec 5, 2016 at 6:42

Torben Klein

3,2031 gold badge23 silver badges30 bronze badges

3 Comments

VBB Over a year ago

Nice. Seems to match up with the 2^13 pointed out by @DYZ above. But doesn't this seem like a bad idea? It is the inconsistency that bothers me. I'd expect this to either give all zeros, or give up right after the first row (strict inplace). In the current implementation, it seems like the final answer will depend on the cache size.

BrenBarn Over a year ago

@VBB: I think the idea is just that self-referential in-place operations have undefined behavior.

Torben Klein Over a year ago

Yes, exactly. Generally, when you work with large arrays you need to have a clear picture of the underlying memory allocations, or else your RAM will be gone quickly. :-)

Collectives™ on Stack Overflow

Numpy array subtraction: inconsistent values for large arrays

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related