Creating new numpy arrays based on condition

Question

I have 2 numpy arrays:

aa = np.random.rand(5,5)
bb = np.random.rand(5,5)

How can I create a new array which has a value of 1 when both aa and bb exceed 0.5?

What do you mean when both aa and bb exceed 0, aa and bb are matrices containing values with range between [0-1] — BPL
– BPL, Commented Aug 14, 2016 at 14:56
What happens when they are below 0.5? Is the corresponding value 0? — user2285236
– user2285236, Commented Aug 14, 2016 at 15:04

Divakar · Accepted Answer · 2016-08-17 10:45:13Z

With focus on performance and using two methods few aproaches could be added. One method would be to get the boolean array of valid ones and converting to int datatype with .astype() method. Another way could involve using np.where that lets us select between 0 and 1 based on the same boolean array. Thus, essentially we would have two methods, one that harnesses efficient datatype conversion and another that uses selection criteria. Now, the boolean array could be obtained in two ways - One using simple comparison and another using np.logical_and. So, with two ways to get the boolean array and two methods to convert the boolean array to int array, we would end up with four implementations as listed below -

out1 = ((aa>0.5) & (bb>0.5)).astype(int)
out2 = np.logical_and(aa>0.5, bb>0.5).astype(int)
out3 = np.where((aa>0.5) & (bb>0.5),1,0)
out4 = np.where(np.logical_and(aa>0.5, bb>0.5), 1, 0)

You can play around with the datatypes to use less precision types, which shouldn't hurt as we are setting the values to 0 and 1 anyway. The benefit should be noticeable speedup as it leverages memory efficiency. We could use int8, uint8, np.int8, np.uint8 types. Thus, the variants of the earlier listed approaches using the new int datatypes would be -

out5 = ((aa>0.5) & (bb>0.5)).astype('int8')
out6 = np.logical_and(aa>0.5, bb>0.5).astype('int8')
out7 = ((aa>0.5) & (bb>0.5)).astype('uint8')
out8 = np.logical_and(aa>0.5, bb>0.5).astype('uint8')

out9 = ((aa>0.5) & (bb>0.5)).astype(np.int8)
out10 = np.logical_and(aa>0.5, bb>0.5).astype(np.int8)
out11 = ((aa>0.5) & (bb>0.5)).astype(np.uint8)
out12 = np.logical_and(aa>0.5, bb>0.5).astype(np.uint8)

Runtime test (as we are focusing on performance with this post) -

In [17]: # Input arrays
    ...: aa = np.random.rand(1000,1000)
    ...: bb = np.random.rand(1000,1000)
    ...: 

In [18]: %timeit ((aa>0.5) & (bb>0.5)).astype(int)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(int)
    ...: %timeit np.where((aa>0.5) & (bb>0.5),1,0)
    ...: %timeit np.where(np.logical_and(aa>0.5, bb>0.5), 1, 0)
    ...: 
100 loops, best of 3: 9.13 ms per loop
100 loops, best of 3: 9.16 ms per loop
100 loops, best of 3: 10.4 ms per loop
100 loops, best of 3: 10.4 ms per loop

In [19]: %timeit ((aa>0.5) & (bb>0.5)).astype('int8')
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype('int8')
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype('uint8')
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype('uint8')
    ...: 
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype(np.int8)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(np.int8)
    ...: %timeit ((aa>0.5) & (bb>0.5)).astype(np.uint8)
    ...: %timeit np.logical_and(aa>0.5, bb>0.5).astype(np.uint8)
    ...: 
100 loops, best of 3: 5.6 ms per loop
100 loops, best of 3: 5.61 ms per loop
100 loops, best of 3: 5.63 ms per loop
100 loops, best of 3: 5.63 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.62 ms per loop
100 loops, best of 3: 5.61 ms per loop

In [20]: %timeit 1 * ((aa > 0.5) & (bb > 0.5)) #@BPL's vectorized soln
100 loops, best of 3: 10.2 ms per loop

Thanks for such a detailed answer, if i create a mask array of size 100 x 100 using the method you specified, how would i mask it on lets say an array "arr "which is of 100 x 100 to filter values where true gives me actual value in array "arr" for that location and false gives me -1

BPL · Accepted Answer · 2016-08-14 15:05:28Z

2

What about this?

import numpy as np

aa = np.random.rand(5, 5)
bb = np.random.rand(5, 5)

print aa
print bb

cc = 1 * ((aa > 0.5) & (bb > 0.5))
print cc

edited Aug 14, 2016 at 15:05

user2285236

answered Aug 14, 2016 at 15:04

BPL

9,99512 gold badges69 silver badges135 bronze badges

Comments

Abhishek Soni · Accepted Answer · 2016-08-14 15:09:57Z

-3

when element of aa and bb at index i is exceed than 0.5 then new array have 1 at index i

aa = np.random.rand(5,5)
bb = np.random.rand(5,5)
new_arr = []
for i in range(5):
    for j in range(5):
        if aa[i] >0.5 and bb[i]>0.5:
              new_arr[i] = 1
        else:
              new_arr[i] = "any Value You want

answered Aug 14, 2016 at 15:09

Abhishek Soni

164 bronze badges

3 Comments

Oscar Smith Over a year ago

Using Numpy without using ufuncs (especially if they exist) is never the answer.

Eugene Lisitsky Over a year ago

You can get real performance troubles working with numpy this way.

rayryeng Over a year ago

The purpose of numpy is to exploit vectorization. This is clearly not the best way to use numpy or you've written this question without knowing how numpy works.

Collectives™ on Stack Overflow

Creating new numpy arrays based on condition

3 Answers 3

3 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related