6

I have a 2D numpy array with 'n' unique values. I want to produce a binary matrix, where all values are replaced with 'zero' and a value which I specify is assigned as 'one'.

For example, I have an array as follows and I want all instances of 35 to be assigned 'one':

array([[12, 35, 12, 26],
       [35, 35, 12, 26]])

I am trying to get the following output:

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

what is the most efficient way to do it in Python?

1
  • 2
    use numpy.zeros(), but save the indices of your wanted values. After replace one in that indices. Commented Jan 17, 2018 at 19:10

6 Answers 6

14
import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35).astype(int)

will give you:

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

The == operator in numpy performs an element-wise comparison, and when converting booleans to ints True is encoded as 1 and False as 0.

Sign up to request clarification or add additional context in comments.

4 Comments

@kmario23 Well, the fastest one would be with (x == 35).astype(), but using one of 8-bit based dtypes - stackoverflow.com/a/38988035. Hence, it would be one of - (x == 35).astype(np.uint8) ones.
@kmario23 And leverage numexpr module for large arrays - import numexpr as ne; ne.evaluate('x==35').astype(np.uint8) for further speedup. Also, we can just view those - (x == 35).view(np.uint8), etc. if views are okay.
@Divakar: numexpr is indeed the fastest by far; thanks for this nice alternative.
@Divakar Thanks for all your suggestions! Added comprehensive timings below, including your suggested approaches :)
7

One more elegant way when compared to all other solutions, would be to just use np.isin()

>>> arr
array([[12, 35, 12, 26],
       [35, 35, 12, 26]])

# get the result as binary matrix
>>> np.isin(arr, 35).astype(np.uint8)
array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

np.isin() would return a boolean mask with True values where the given element (here 35) is present in the original array, and False elsewhere.


Another variant would be to cast the boolean result using np.asarray() with data type np.uint8 for better speed:

In [18]: np.asarray(np.isin(x, 35), dtype=np.uint8)
Out[18]: 
array([[0, 1, 0, 0],
       [1, 1, 0, 0]], dtype=uint8)

Benchmarking

By explicitly casting the boolean result to uint8, we can gain more than 3x better performance. (Thanks to @Divakar for pointing this out!) See timings below:

# setup (large) input array
In [3]: x = np.arange(25000000)
In [4]: x[0] = 35
In [5]: x[1000000] = 35
In [6]: x[2000000] = 35
In [7]: x[-1] = 35
In [8]: x = x.reshape((5000, 5000))

# timings
In [20]: %timeit np.where(x==35, 1, 0)
427 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [21]: %timeit (x == 35) + 0
450 ms ± 72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [22]: %timeit (x == 35).astype(np.uint8)
126 ms ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# the fastest choice to go for!    
In [23]: %timeit np.isin(x, 35).astype(np.uint8)
115 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [24]: %timeit np.asarray(np.isin(x, 35), dtype=np.uint8)
117 ms ± 2.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

If you want a real warhorse, then use numexpr as in:

In [8]: import numexpr as ne

In [9]: %timeit ne.evaluate("x==35").astype(np.uint8)
23 ms ± 2.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is ca. 20x faster than the slowest approach using NumPy based computations.


Finally, if views are okay, we can get such crazy speedups using NumPy approaches itself.

In [13]: %timeit (x == 35).view(np.uint8)
20.1 ms ± 93.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [15]: %timeit np.isin(x, 35).view(np.uint8)
30.2 ms ± 1.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

(Again, thanks to @Divakar for mentioning these super nice tricks!)

Comments

4
import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
(x == 35) + 0

array([[0, 1, 0, 0], [1, 1, 0, 0]])

1 Comment

Oh, very clever. I went ahead an did a minor edit. The thought is good, however, I think that boolean_array + 0 is not the most efficient way to convert data-types.
4

Another option would be to use np.where; this solution is slower than @yuji's solution (see timings below) but it is more flexible if you want to do anything else but putting in zeroes and ones (see example below).

import numpy as np
x = np.array([[12, 35, 12, 26], [35, 35, 12, 26]])
np.where(x==35, 1, 0)

which yields

array([[0, 1, 0, 0],
       [1, 1, 0, 0]])

One can read it like where x is equal to 35 put in a 1, everywhere else insert a 0.

As written, you now have great flexibility, you can e.g. also do the following:

np.where(x==35, np.sqrt(x), x - 3)

array([[  9.        ,   5.91607978,   9.        ,  23.        ],
       [  5.91607978,   5.91607978,   9.        ,  23.        ]])

So everywhere, where x is equal to 35, you get the square root and from all other values you subtract 3.

Timings:

%timeit np.where(x==35, 1, 0)
100000 loops, best of 3: 5.85 µs per loop

%timeit (x == 35).astype(int)
100000 loops, best of 3: 3.23 µs per loop

%timeit np.isin(x, 35).astype(int)
10000 loops, best of 3: 18.7 µs per loop

%timeit (x == 35) + 0
100000 loops, best of 3: 5.85 µs per loop

Comments

2

If your array is a numpy array then you can use the '==' operator on your array to return a boolean array. Then use the astype feature to turn it to zeros and ones.

import numpy as np
my_array = np.array([[12, 35, 12, 26],
                     [35, 35, 12, 26]])

indexed = (my_array == 35).astype(int)

print indexed

Comments

2

I like @yuji approach. Very elegant!

Just for a sake of diversity here is another answer with a lot of labor....

>>> from numpy import np
>>> x = np.array([[12, 35, 12, 26],[35, 35, 12, 26]])
>>> x
array([[12, 35, 12, 26],
       [35, 35, 12, 26]])
>>> y=np.zeros(x.shape)
>>> y[np.where(x==35)] = np.ones(len(np.where(x==35)[0]))
>>> y
array([[ 0.,  1.,  0.,  0.],
       [ 1.,  1.,  0.,  0.]])
>>> 

1 Comment

@juanpa.arrivillaga I agree and edit my answer, but often works well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.