1

I'm reading the book Python for data analysis about numpy Boolen indexing, it says Selecting data from an array by boolean indexing always creates a copy of the data, but why I could change the original array using Boolen indexing? Is anyone could help me? Thanks a lot. here is the example:

In [86]: data
Out[86]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])

In [96]: data[data < 0] = 0
In [97]: data
Out[97]:
array([[ 0. , 0.5433, 0. , 1.2792],
[ 0. , 0.5465, 0.0939, 0. ],
[ 0. , 0. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 0. , 0.0672],
[ 0. , 0. , 1.1503, 1.7289],
[ 0.1913, 0.4544, 0.4519, 0.5535],
[ 0.5994, 0.8174, 0. , 0. ]])

2 Answers 2

3

In a fetch or __getitem__ the boolean indexing does return a copy. But if used immediately before an assignment, it's a __setitem__ case, and the selected values will be changed:

In [196]: data = np.arange(10)
In [197]: d1 = data[data<5]
In [198]: d1                 # a copy
Out[198]: array([0, 1, 2, 3, 4])
In [199]: d1[:] = 0
In [200]: data               # not change to the original
Out[200]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Masked assignment:

In [201]: data[data<5] = 0
In [202]: data
Out[202]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])   # changed data

Indirect assignment does nothing:

In [204]: data[data<5][:] = 1
In [205]: data
Out[205]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])

Think of it as data.__getitem__(mask).__setitem__(slice) = 1. The get item returns a copy, which the set item changes - but doesn't change the original.

So if you need to use advanced indexing of the LHS, make sure it is immediately before the assignment. And you can't use 2 advanced indexing step on the LHS.

view v copy

With basic indexing it is possible to use the original databuffer, and just change attributes like shape and strides. For example:

In [85]: x = np.arange(10)
In [86]: x.shape
Out[86]: (10,)
In [87]: x.strides
Out[87]: (4,)

In [88]: y = x[::2]
In [89]: y.shape
Out[89]: (5,)
In [90]: y.strides
Out[90]: (8,)

y has the same databuffer as x (compare the x.__array_interface__ dictionaries). x uses all 10 4bytes elements; y uses every other one (strides steps by 8 bytes instead of 4).

But with advanced indexing you can't express the element selection in terms of shape and strides.

In [98]: z = x[[1,2,6,7,0]]
In [99]: z.shape
Out[99]: (5,)
In [100]: z.strides
Out[100]: (4,)

Items in the original array can be selected in any order and with repetitions. There's no regular pattern. So a copy is required.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks, I'm just a beginner and never coded before, could you please explain more about 2 advanced indexing step on the LHS, I even didn't know the meaning of LHS.
LHS = left hand side, the stuff to the left of the =. I trying to warn about a case where the assignment wouldn't work because it creates an hidden copy. Your example case works because there isn't an intermediate copy.
got it, but I just wonder why boolean indexing returns a copy as NumPy has been designed with large data, other basic indexing and slicing are just views on the original array? is there any special design purpose?
I edited my answer, trying to illustrate the difference between basic and advanced indexing.
1

Boolean indexing returns a copy of the data, not a view of the original data, like one gets for slices.

>>> b=data[data<0]; b # this is a copy of data
array([-0.048 , -0.2349, -0.268 , -2.0445, -0.047 , -2.026 , -0.0523,
       -1.0023, -0.1698, -0.9297, -1.2564])

I can manipulate b and data is preserved.

>>> b[:] = 0; b
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> data
array([[-0.048 ,  0.5433, -0.2349,  1.2792],
       [-0.268 ,  0.5465,  0.0939, -2.0445],
       [-0.047 , -2.026 ,  0.7719,  0.3103],
       [ 2.1452,  0.8799, -0.0523,  0.0672],
       [-1.0023, -0.1698,  1.1503,  1.7289],
       [ 0.5994,  0.8174, -0.9297, -1.2564]])

Now, for a slice:

>>> a = data[0,:]; a # a is not a copy of data 
array([-0.048 ,  0.5433, -0.2349,  1.2792])
>>> a[:] = 0; a
array([ 0.,  0.,  0.,  0.])
>>> data
array([[ 0.    ,  0.    ,  0.    ,  0.    ],
       [-0.268 ,  0.5465,  0.0939, -2.0445],
       [-0.047 , -2.026 ,  0.7719,  0.3103],
       [ 2.1452,  0.8799, -0.0523,  0.0672],
       [-1.0023, -0.1698,  1.1503,  1.7289],
       [ 0.5994,  0.8174, -0.9297, -1.2564]])

However, as you've identified, assignments made via indexed arrays are always made to the original data.

>>> data[data<0] = 1; data
array([[ 1.    ,  0.5433,  1.    ,  1.2792],
       [ 1.    ,  0.5465,  0.0939,  1.    ],
       [ 1.    ,  1.    ,  0.7719,  0.3103],
       [ 2.1452,  0.8799,  1.    ,  0.0672],
       [ 1.    ,  1.    ,  1.1503,  1.7289],
       [ 0.5994,  0.8174,  1.    ,  1.    ]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.