1

a bit embarassing to ask since the heavy documentation on Numpy but I was stuck doing this simple task, that is getting all the records for which a mask is true in a nested numpy representation (equivalent to the dataframe.loc[cond] in pandas):

import numpy as np
a1 = np.array([1,2,3])
a2 = np.array(['a','b','c'])
a3 = np.array(['luca','paolo','francesco'])
a4 = np.array([True, False,False], dtype='bool')

combination = np.array([a1,a2,a3,a4])
print(combination)

# slice for a4 == True 
combination[combination[3] == 'True']

but the result is not what I want.

in fact from combination :

[['1' '2' '3']
 ['a' 'b' 'c']
 ['luca' 'paolo' 'francesco']
 ['True' 'False' 'False']]

it yields with combination[combination[3] == 'True']:

array([['1', '2', '3']], 
      dtype='<U11')

when in reality I want:

[['1']
 ['a' ]
 ['luca']
 ['True' ]]

any ideas on what I am doing wrong?

P.S.: no i can't do it in pandas because pandas has my RAM exploding when converting this to a pandas.Dataframe

1 Answer 1

2

I believe you're simply missing the indices of the other dimension:

combination[combination[3] == 'True']

should be

combination[:, combination[3] == 'True']

Note the colon.

This yields a new ndarray indexed over all of the first dimension and only 0 in the second.

Sign up to request clarification or add additional context in comments.

1 Comment

I feel like smashing the keyboard after realizing this. Thank you for your quick answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.