1

I have a numpy array that is rather large, about 1mill. The distinct number of numbers is about 8 numbered 1-8.

Lets say I want given the number 2, I would like to recode all 2's to 1 and the rest to 0's.

i.e. 
2==>1
1345678==0

Is there a pythonic way to do this with numpy?


[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]=> [0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0]

Thanks

2 Answers 2

5

That's the result of a == 2 for a NumPy array a:

>>> a = numpy.random.randint(1, 9, size=20)
>>> a
array([4, 5, 1, 2, 5, 7, 2, 5, 8, 2, 4, 6, 6, 1, 8, 7, 1, 7, 8, 7])
>>> a == 2
array([False, False, False,  True, False, False,  True, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False], dtype=bool)
>>> (a == 2).astype(int)
array([0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

If you want to change a in place, the most efficient way to do so is to use numpy.equal():

>>> numpy.equal(a, 2, out=a)
array([0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Sign up to request clarification or add additional context in comments.

Comments

4

I'd probably use np.where for this:

>>> import numpy as np
>>> a = np.array([[1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8]])
>>> np.where(a==2, 1, 0)
array([[0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

2 Comments

Using numpy.where() on the result of a == 2 here seems redundant to me, and it is less efficient than simply not calling it. Any rationale for it?
Only that I tend to like solutions which are robust to perturbations in the values; no real motivation otherwise.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.