Python numpy indexing copy

Question

I'm reading the book Python for data analysis about numpy Boolen indexing, it says Selecting data from an array by boolean indexing always creates a copy of the data, but why I could change the original array using Boolen indexing? Is anyone could help me? Thanks a lot. here is the example:

In [86]: data
Out[86]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])

In [96]: data[data < 0] = 0
In [97]: data
Out[97]:
array([[ 0. , 0.5433, 0. , 1.2792],
[ 0. , 0.5465, 0.0939, 0. ],
[ 0. , 0. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 0. , 0.0672],
[ 0. , 0. , 1.1503, 1.7289],
[ 0.1913, 0.4544, 0.4519, 0.5535],
[ 0.5994, 0.8174, 0. , 0. ]])

hpaulj · Accepted Answer · 2017-02-28 18:39:10Z

3

In a fetch or __getitem__ the boolean indexing does return a copy. But if used immediately before an assignment, it's a __setitem__ case, and the selected values will be changed:

In [196]: data = np.arange(10)
In [197]: d1 = data[data<5]
In [198]: d1                 # a copy
Out[198]: array([0, 1, 2, 3, 4])
In [199]: d1[:] = 0
In [200]: data               # not change to the original
Out[200]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Masked assignment:

In [201]: data[data<5] = 0
In [202]: data
Out[202]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])   # changed data

Indirect assignment does nothing:

In [204]: data[data<5][:] = 1
In [205]: data
Out[205]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])

Think of it as data.__getitem__(mask).__setitem__(slice) = 1. The get item returns a copy, which the set item changes - but doesn't change the original.

So if you need to use advanced indexing of the LHS, make sure it is immediately before the assignment. And you can't use 2 advanced indexing step on the LHS.

view v copy

With basic indexing it is possible to use the original databuffer, and just change attributes like shape and strides. For example:

In [85]: x = np.arange(10)
In [86]: x.shape
Out[86]: (10,)
In [87]: x.strides
Out[87]: (4,)

In [88]: y = x[::2]
In [89]: y.shape
Out[89]: (5,)
In [90]: y.strides
Out[90]: (8,)

y has the same databuffer as x (compare the x.__array_interface__ dictionaries). x uses all 10 4bytes elements; y uses every other one (strides steps by 8 bytes instead of 4).

But with advanced indexing you can't express the element selection in terms of shape and strides.

In [98]: z = x[[1,2,6,7,0]]
In [99]: z.shape
Out[99]: (5,)
In [100]: z.strides
Out[100]: (4,)

Items in the original array can be selected in any order and with repetitions. There's no regular pattern. So a copy is required.

edited Feb 28, 2017 at 18:39

answered Feb 26, 2017 at 3:22

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

sutungpo Over a year ago

Thanks, I'm just a beginner and never coded before, could you please explain more about 2 advanced indexing step on the LHS, I even didn't know the meaning of LHS.

hpaulj Over a year ago

LHS = left hand side, the stuff to the left of the =. I trying to warn about a case where the assignment wouldn't work because it creates an hidden copy. Your example case works because there isn't an intermediate copy.

sutungpo Over a year ago

got it, but I just wonder why boolean indexing returns a copy as NumPy has been designed with large data, other basic indexing and slicing are just views on the original array? is there any special design purpose?

hpaulj Over a year ago

I edited my answer, trying to illustrate the difference between basic and advanced indexing.

Crispin · Accepted Answer · 2017-02-26 02:53:39Z

Boolean indexing returns a copy of the data, not a view of the original data, like one gets for slices.

>>> b=data[data<0]; b # this is a copy of data
array([-0.048 , -0.2349, -0.268 , -2.0445, -0.047 , -2.026 , -0.0523,
       -1.0023, -0.1698, -0.9297, -1.2564])

I can manipulate b and data is preserved.

>>> b[:] = 0; b
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> data
array([[-0.048 ,  0.5433, -0.2349,  1.2792],
       [-0.268 ,  0.5465,  0.0939, -2.0445],
       [-0.047 , -2.026 ,  0.7719,  0.3103],
       [ 2.1452,  0.8799, -0.0523,  0.0672],
       [-1.0023, -0.1698,  1.1503,  1.7289],
       [ 0.5994,  0.8174, -0.9297, -1.2564]])

Now, for a slice:

>>> a = data[0,:]; a # a is not a copy of data 
array([-0.048 ,  0.5433, -0.2349,  1.2792])
>>> a[:] = 0; a
array([ 0.,  0.,  0.,  0.])
>>> data
array([[ 0.    ,  0.    ,  0.    ,  0.    ],
       [-0.268 ,  0.5465,  0.0939, -2.0445],
       [-0.047 , -2.026 ,  0.7719,  0.3103],
       [ 2.1452,  0.8799, -0.0523,  0.0672],
       [-1.0023, -0.1698,  1.1503,  1.7289],
       [ 0.5994,  0.8174, -0.9297, -1.2564]])

However, as you've identified, assignments made via indexed arrays are always made to the original data.

>>> data[data<0] = 1; data
array([[ 1.    ,  0.5433,  1.    ,  1.2792],
       [ 1.    ,  0.5465,  0.0939,  1.    ],
       [ 1.    ,  1.    ,  0.7719,  0.3103],
       [ 2.1452,  0.8799,  1.    ,  0.0672],
       [ 1.    ,  1.    ,  1.1503,  1.7289],
       [ 0.5994,  0.8174,  1.    ,  1.    ]])

Collectives™ on Stack Overflow

Python numpy indexing copy

2 Answers 2

view v copy

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

view v copy

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related