1

this is my numpy.char.array

table = np.char.arrray([['/finance/stocks/overview?symbol=TMIN.NS&exchange=INSE'],
['/finance/stocks/overview?symbol=8KMS.BO&exchange=INB'],
['/finance/stocks/overview?symbol=ADRG.NS&exchange=INSE']],dtype='|S53')

how can i get the below desired output:

out = ['TMIN.NS','8KMS.BO','ADRG.NS']

with table.find(".NS")i can get the index position of .NSin the string. But how can i use this to get to the desired output?

In [69]: table.find(".NS")
Out[69]: 
       array([[36],
             [-1],
             [36],
             ..., 
             [36],
             [36],
             [36]])

the reason, simple index based selection does not work is because, the whole string is just single element. The shape of array is (30L,1L)

I can use str or regex on individual string elements to get the desired output, but that will require running a for loop over the array. How can i do this in numpy alone? thanks.

edit_1/ this is how i can get the result though indexing but i cannot do it at the same time on the whole array

table[0][0][32:38]
Out[75]: 'TMIN.N'
5
  • Are the strings that you are looking for always of 7 characters? Would they always be followed by that string '/finance/stocks/overview?symbol='? Commented Mar 14, 2017 at 19:44
  • Yes, in this case, they will follow a fixed pattern. Commented Mar 14, 2017 at 19:47
  • And always of 7 characters? Commented Mar 14, 2017 at 19:47
  • Yes, the string to be selected will be of 7 characters. (there will be variation (atmost 8 characters where some tickers will be of 5 instead of 4 character length), but at this point, i will just ignore it since it will likely be may be about 2% of whole dataset). Commented Mar 14, 2017 at 19:50
  • The char functions (in this case methods of chararray) just apply the corresponding string method to each element of the array. They don't speed things up much compared to an explicit loop. I'd suggest apply your own string operation in a list comprehension. Commented Mar 14, 2017 at 19:59

2 Answers 2

1

The np.char functions/methods don't speed things up much - they just loop through the elements apply the corresponding string method.

In [261]: timeit [astr.find(".NS") for astr in table.flat]
....
100000 loops, best of 3: 3.92 µs per loop
In [262]: timeit table.find(".NS")
....
100000 loops, best of 3: 11.6 µs per loop

So defining a simple function that isolates the desired substring (one of several possible routes),

def extract(astr):
    astr=astr.split('?')[1].split('&')[0]
    astr = astr.split('=')[1]
    return astr

In [268]: [extract(astr) for astr in table.flat]
Out[268]: ['TMIN.NS', '8KMS.BO', 'ADRG.NS']
In [269]: timeit [extract(astr) for astr in table.flat]
100000 loops, best of 3: 8.98 µs per loop

A general observation is that with small array/lists the list comprehension route is often faster than an equivalent array. Array operations get better with size.

Sign up to request clarification or add additional context in comments.

Comments

0

Using a vectorized slicing method of NumPy array of string dtypes from this post -

In [149]: search_pattern = '/finance/stocks/overview?symbol='

In [150]: pruned_table = np.chararray.replace(table, search_pattern,'')

In [151]: slicer_vectorized(pruned_table, 0, 7)
Out[151]: 
array(['TMIN.NS', '8KMS.BO', 'ADRG.NS'], 
      dtype='|S7')

Alternatively, since we know the strings that we are looking for would be right after the search_pattern, we can simply look for it right after the length of that pattern, like so -

In [167]: N  = len(search_pattern)

In [168]: slicer_vectorized(table, N,N+7)
Out[168]: 
array(['TMIN.NS', '8KMS.BO', 'ADRG.NS'], 
      dtype='|S7')

1 Comment

you are simply the best.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.