2

I'm new to Numpy and it's been a while writing python.

I'm struggeling to find multiple strings in a Numpy array which was sliced.
My data:

string0 = "part0-part1-part2-part3-part4"
string1 = "part5-part6-part9-part7-part8"
string2 = "part5-part6-part1-part8-part7"

Sliced in to each part and combined to one array again to have it all in one place.

stringsraw = np.array([[string0], [string1], [string2]])
stringssliced = np.array(np.char.split(stringsraw, sep = '-').tolist())
stringscombined = np.squeeze(np.dstack((stringsraw, stringssliced)))

Results in:

[['part0-part1-part2-part3-part4' 'part0' 'part1' 'part2' 'part3' 'part4']
 ['part5-part6-part9-part7-part8' 'part5' 'part6' 'part9' 'part7' 'part8']
 ['part5-part6-part1-part7-part8' 'part5' 'part6' 'part1' 'part8' 'part7']]

Want to find the indices of 'part1' and 'part7'

np.where((stringscombined[2] == "part1") & (stringscombined[2] == "part7"))

The result is nothing. Can anyone explain why the result is not [3,4]?

Thought there would be a nicer way to not for loop through everything.

The "whished" query/result would be:

np.where((stringscombined == "part6") & (stringscombined == "part7")) 
= array[[1,2,4]
        [2,2,5]]

any help appreciated

2 Answers 2

1

We can first detect where the two elements will be, using np.isin:

np.isin(stringscombined,["part1","part7"])
array([[False, False,  True, False, False, False],
       [False, False, False, False,  True, False],
       [False, False, False,  True, False,  True]])

Using np.where() on this will tell us where the elements can be found. We need one more information, which is which row has both "part1" and "part7":

(np.sum(stringscombined=="part1",axis=1)>0) & (np.sum(stringscombined=="part7",axis=1)>0)

array([False, False,  True])

The above will tell us to take only indices from the 2nd row. Combining these two information into a function:

def index_A(Array,i1,i2):
    idx = (np.sum(Array==i1,axis=1)>0) & (np.sum(Array==i2,axis=1)>0)
    loc = np.where(np.isin(Array,[i1,i2]))
    hits = [np.insert(loc[1][loc[0]==i],0,i) for i in np.where(idx)[0]]
    return hits

index_A(stringscombined,"part6","part7")
[array([1, 2, 4]), array([2, 2, 5])]
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very mutch. Also for answering so quickly. Can you help me to understand why my approach doesn't get's any result? Just want to understand where im miss thinking. For my unterstanding a bitwise AND should give me the result that np.isin returns.
ok lemme try, np.where((stringscombined[2] == "part1") & (stringscombined[2] == "part7")), basically you are looking for where the two booleans are True
but this is impossible, try stringscombined[2] == "part1 and stringscombined[2] =="part7"separately. the first boolean is a True/False on whether each element in the array is part1, 2nd boolean is on whether each element in the array is exactly part7.
damnit! didn't know that i can acctually do stringscombined[2] == "part1 without a function that it calls. That makes it clear. Thank you very much!
0

We can simplify dimensions a bit with:

In [475]: stringsraw = np.array([string0, string1, string2])                             
In [476]: stringsraw                                                                     
Out[476]: 
array(['part0-part1-part2-part3-part4', 'part5-part6-part9-part7-part8',
       'part5-part6-part1-part8-part7'], dtype='<U29')
In [477]: np.char.split(stringsraw, sep='-')                                             
Out[477]: 
array([list(['part0', 'part1', 'part2', 'part3', 'part4']),
       list(['part5', 'part6', 'part9', 'part7', 'part8']),
       list(['part5', 'part6', 'part1', 'part8', 'part7'])], dtype=object)
In [478]: np.stack(_)                                                                    
Out[478]: 
array([['part0', 'part1', 'part2', 'part3', 'part4'],
       ['part5', 'part6', 'part9', 'part7', 'part8'],
       ['part5', 'part6', 'part1', 'part8', 'part7']], dtype='<U5')
In [479]: arr = _                        

A list comprehension would be just as good (and fast):

In [491]: [str.split('-') for str in [string0, string1, string2]]                        
Out[491]: 
[['part0', 'part1', 'part2', 'part3', 'part4'],
 ['part5', 'part6', 'part9', 'part7', 'part8'],
 ['part5', 'part6', 'part1', 'part8', 'part7']]
In [492]: np.array(_)                                                                    
Out[492]: 
array([['part0', 'part1', 'part2', 'part3', 'part4'],
       ['part5', 'part6', 'part9', 'part7', 'part8'],
       ['part5', 'part6', 'part1', 'part8', 'part7']], dtype='<U5')

And then do equality tests on slices or the whole array:

In [488]: np.nonzero((arr[2]=='part1')|(arr[2]=='part7'))                                
Out[488]: (array([2, 4]),)
In [489]: arr=='part1'                                                                   
Out[489]: 
array([[False,  True, False, False, False],
       [False, False, False, False, False],
       [False, False,  True, False, False]])
In [490]: np.nonzero(_)                                                                  
Out[490]: (array([0, 2]), array([1, 2]))

In [493]: np.in1d(arr[2],['part1','part7'])                                              
Out[493]: array([False, False,  True, False,  True])

There's nothing special about numpy's handling of strings.

np.isin also works. It uses in1d. If one argument is small, it actually does the repeated | as in [488]:

In [501]: np.isin(arr,['part1','part7'])                                                 
Out[501]: 
array([[False,  True, False, False, False],
       [False, False, False,  True, False],
       [False, False,  True, False,  True]])
In [502]: np.nonzero(_)                                                                  
Out[502]: (array([0, 1, 2, 2]), array([1, 3, 2, 4]))

2 Comments

Thank you for answering so quickly. My problem is that i really need the original information in the same "definition" but it solves my probelm! I'm currently not getting why np.isin is returning what i think would np.where should do
isin returns a boolean mask. np.where/np.nonzero returns the indices of the True values in that mask.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.