27

I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:

I = A[A[:,1] == i]

which works. Then I further tried (similarly to matlab which I know very well):

I = A[A[:,1] == i, [0,2,3]]

which doesn't work. How to do it?


EXAMPLE DATA:

 >>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
 >>> print A
 [[1 2 3 4]
  [6 1 3 4]
  [3 2 5 6]]
 >>> i = 2
     
 # I want to get the columns 1, 3 and 4 
 # for every row which has the value i in the second column. 
 # In this case, this would be row 1 and 3 with columns 1, 3 and 4:
 [[1 3 4]
  [3 5 6]]
 

I am now currently using this:

I = A[A[:,1] == i]
I = I[:, [0,2,3]]

But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)

5
  • A[A[:,1] == i][0,2,3] didn't work either? Commented May 28, 2014 at 12:50
  • I = A[A[:,1] == i][0,2,3] --> IndexError: too many indices Commented May 28, 2014 at 13:01
  • And apart from that I got to admit that I wouldn't really understand that indexing either, very different from matlab... Commented May 28, 2014 at 13:03
  • @tim: Could you please post the array and what output do you expect? Commented May 28, 2014 at 13:06
  • @Ankur Ankan: edited into the question. Commented May 28, 2014 at 13:13

5 Answers 5

38
>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5,  6,  8],
       [ 9, 10, 12]])

# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5,  6,  8],
       [ 9, 10, 12]])

For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323

Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:

>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5,  6,  8],
       [ 9, 10, 12]])
Sign up to request clarification or add additional context in comments.

4 Comments

So really two consecutive selections necessary?
If you're wishing you could do a[x][y] where x and y are boolean masks, yeah, I wish that too, but it does not work. This seems to be a known problem, and I don't know why, but it's hardly important here.
Not only that, I wished to be able to select the rows and colums in ONE single statement like this: A[row_indices_to_select, colum_indices_to_select], whereas row_indices_to_select would be coming from the condition I wanted to apply.. :(
I've added some more solutions--I like the last one using ix_() with a tuple.
6

If you do not want to use boolean positions but the indexes, you can write it this way:

A[:, [0, 2, 3]][A[:, 1] == i]

Going back to your example:

>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
 [6 1 3 4]
 [3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
 [3 5 6]]

Seriously,

1 Comment

Boolean positions actually are okay for me, I just would have wanted to do the selection in ONE step and not in two consecutive selections (which your solution is doing, isn't it?) because of performance reasons.
3
>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
       [1, 3]])
>>> 

Comments

1

This also works.

I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I

Edit: Since indexing starts from 0, so

i-1

should be used.

3 Comments

The algorithm must be correct, but it is not very pythonic.
@Taha maybe not, bu it saves you double selection. The idea is actually simple, first choose cols then iterate over rows.
@genclik27 I understood what you did. But lately, I am doing some numerical computation with large matrices. I always was in need of vectorized calculations. The problem of what you are proposing is that you create a new list. You cannot change the values directly in the matrix this way. It is indeed useful if you don't need to change the values of A.
1

I am hoping this answers your question but a piece of script I have implemented using pandas is:

df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]

For example,

targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]

this will return a dataframe with only columns ['symbol','date','rtns'] from stockdf where the row value of rtns satisfies, stockdf['rtns'] > .04

hope this helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.