1

I have a numpy array X with shape (768, 8).

The last value for each row can either be 0 or 1, I only want rows with value 1, and call this T.

I did:

T = [x for x in X if x[7]==1]

This is correct, however, this is now a list, not a numpy array (in fact I cannot print T.shape).

What should I do instead to keep this a numpy array?

2
  • 1
    Why not just T = np.array(T) ?? Commented Jul 21, 2016 at 15:36
  • Ok, so just keep the code for T and transform it back to a numpy array? Commented Jul 21, 2016 at 15:38

2 Answers 2

2

NumPy's boolean indexing gets the job done in a fully vectorized manner. This approach is generally more efficient (and arguably more elegant) than using list comprehensions and type conversions.

T = X[X[:, -1] == 1]

Demo:

In [232]: first_columns = np.random.randint(0, 10, size=(10, 7))

In [233]: last_column = np.random.randint(0, 2, size=(10, 1))

In [234]: X = np.hstack((first_columns, last_column))

In [235]: X
Out[235]: 
array([[4, 3, 3, 2, 6, 2, 2, 0],
       [2, 7, 9, 4, 7, 1, 8, 0],
       [9, 8, 2, 1, 2, 0, 5, 1],
       [4, 4, 4, 9, 6, 4, 9, 1],
       [9, 8, 7, 6, 4, 4, 9, 0],
       [8, 3, 3, 2, 9, 5, 5, 1],
       [7, 1, 4, 5, 2, 4, 7, 0],
       [8, 0, 0, 1, 5, 2, 6, 0],
       [7, 9, 9, 3, 9, 3, 9, 1],
       [3, 1, 8, 7, 3, 2, 9, 0]])

In [236]: mask = X[:, -1] == 1

In [237]: mask
Out[237]: array([False, False,  True,  True, False,  True, False, False,  True, False], dtype=bool)

In [238]: T = X[mask]

In [239]: T
Out[239]: 
array([[9, 8, 2, 1, 2, 0, 5, 1],
       [4, 4, 4, 9, 6, 4, 9, 1],
       [8, 3, 3, 2, 9, 5, 5, 1],
       [7, 9, 9, 3, 9, 3, 9, 1]])
Sign up to request clarification or add additional context in comments.

1 Comment

You can also use X.compress(mask, axis=0) for a more explicit / less-overhead. If using row indices instead of a mask you can use X.take(rowxs, axis=0). Numpy's fancy indexing calls these functions under the hood.
1

By calling

T = [x for x in X if x[8]==1]

you are making T as a list. To convert it any list to a numpy array, just use:

T = numpy.array([x for x in X if x[8]==1])

Here is what happens:

In [1]: import numpy as np 

In [2]: a = [1,2,3,4]

In [3]: a.T
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-9f69ed463660> in <module>()
----> 1 a.T

AttributeError: 'list' object has no attribute 'T'

In [4]: a = np.array(a)

In [5]: a.T
Out[5]: array([1, 2, 3, 4])

In [6]: 

3 Comments

Ok, so I need the intermediate passage to a list and back to a numpy matrix. There is no direct conversion.
@user: Check the edit. You can just say, numpy.array([x for x in X if x[8]==1])
Yes, that is the same. I was just wondering if there was a preferred numpy operation. Anyway, this works, thanks.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.