Select certain rows (condition met), but only some columns in Python/Numpy

Question

I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:

I = A[A[:,1] == i]

which works. Then I further tried (similarly to matlab which I know very well):

I = A[A[:,1] == i, [0,2,3]]

which doesn't work. How to do it?

EXAMPLE DATA:

 >>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
 >>> print A
 [[1 2 3 4]
  [6 1 3 4]
  [3 2 5 6]]
 >>> i = 2
     
 # I want to get the columns 1, 3 and 4 
 # for every row which has the value i in the second column. 
 # In this case, this would be row 1 and 3 with columns 1, 3 and 4:
 [[1 3 4]
  [3 5 6]]

I am now currently using this:

I = A[A[:,1] == i]
I = I[:, [0,2,3]]

But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)

And apart from that I got to admit that I wouldn't really understand that indexing either, very different from matlab... — tim
– tim, Commented May 28, 2014 at 13:03
@tim: Could you please post the array and what output do you expect? — Ankur Ankan
– Ankur Ankan, Commented May 28, 2014 at 13:06

Community · Accepted Answer · 2017-05-23 12:18:01Z

38

>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5,  6,  8],
       [ 9, 10, 12]])

# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5,  6,  8],
       [ 9, 10, 12]])

For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323

Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:

>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5,  6,  8],
       [ 9, 10, 12]])

edited May 23, 2017 at 12:18

CommunityBot

11 silver badge

answered May 28, 2014 at 13:16

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

tim Over a year ago

So really two consecutive selections necessary?

John Zwinck Over a year ago

If you're wishing you could do a[x][y] where x and y are boolean masks, yeah, I wish that too, but it does not work. This seems to be a known problem, and I don't know why, but it's hardly important here.

tim Over a year ago

Not only that, I wished to be able to select the rows and colums in ONE single statement like this: A[row_indices_to_select, colum_indices_to_select], whereas row_indices_to_select would be coming from the condition I wanted to apply.. :(

John Zwinck Over a year ago

I've added some more solutions--I like the last one using ix_() with a tuple.

Taha · Accepted Answer · 2014-05-28 13:36:59Z

6

If you do not want to use boolean positions but the indexes, you can write it this way:

A[:, [0, 2, 3]][A[:, 1] == i]

Going back to your example:

>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
 [6 1 3 4]
 [3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
 [3 5 6]]

Seriously,

answered May 28, 2014 at 13:36

Taha

7885 silver badges11 bronze badges

1 Comment

tim Over a year ago

Boolean positions actually are okay for me, I just would have wanted to do the selection in ONE step and not in two consecutive selections (which your solution is doing, isn't it?) because of performance reasons.

Fza · Accepted Answer · 2016-03-15 04:51:56Z

3

>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
       [1, 3]])
>>>

answered Mar 15, 2016 at 4:51

Fza

1,0231 gold badge10 silver badges23 bronze badges

Comments

genclik27 · Accepted Answer · 2014-05-28 13:33:48Z

1

This also works.

I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I

Edit: Since indexing starts from 0, so

i-1

should be used.

answered May 28, 2014 at 13:33

genclik27

3231 gold badge8 silver badges18 bronze badges

3 Comments

Taha Over a year ago

The algorithm must be correct, but it is not very pythonic.

genclik27 Over a year ago

@Taha maybe not, bu it saves you double selection. The idea is actually simple, first choose cols then iterate over rows.

Taha Over a year ago

@genclik27 I understood what you did. But lately, I am doing some numerical computation with large matrices. I always was in need of vectorized calculations. The problem of what you are proposing is that you create a new list. You cannot change the values directly in the matrix this way. It is indeed useful if you don't need to change the values of A.

Aman · Accepted Answer · 2014-12-21 06:21:19Z

1

I am hoping this answers your question but a piece of script I have implemented using pandas is:

df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]

For example,

targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]

this will return a dataframe with only columns ['symbol','date','rtns'] from stockdf where the row value of rtns satisfies, stockdf['rtns'] > .04

hope this helps

edited Dec 21, 2014 at 6:21

Aman

5,77010 gold badges57 silver badges92 bronze badges

answered Dec 21, 2014 at 5:29

5up3rf1u0u5

191 bronze badge

Collectives™ on Stack Overflow

Select certain rows (condition met), but only some columns in Python/Numpy

5 Answers 5

4 Comments

1 Comment

Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

1 Comment

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related