numpy char.array get set of characters from a string

Question

this is my numpy.char.array

table = np.char.arrray([['/finance/stocks/overview?symbol=TMIN.NS&exchange=INSE'],
['/finance/stocks/overview?symbol=8KMS.BO&exchange=INB'],
['/finance/stocks/overview?symbol=ADRG.NS&exchange=INSE']],dtype='|S53')

how can i get the below desired output:

out = ['TMIN.NS','8KMS.BO','ADRG.NS']

with table.find(".NS")i can get the index position of .NSin the string. But how can i use this to get to the desired output?

In [69]: table.find(".NS")
Out[69]: 
       array([[36],
             [-1],
             [36],
             ..., 
             [36],
             [36],
             [36]])

the reason, simple index based selection does not work is because, the whole string is just single element. The shape of array is (30L,1L)

I can use str or regex on individual string elements to get the desired output, but that will require running a for loop over the array. How can i do this in numpy alone? thanks.

edit_1/ this is how i can get the result though indexing but i cannot do it at the same time on the whole array

table[0][0][32:38]
Out[75]: 'TMIN.N'

Are the strings that you are looking for always of 7 characters? Would they always be followed by that string '/finance/stocks/overview?symbol='? — Divakar
– Divakar, Commented Mar 14, 2017 at 19:44
Yes, the string to be selected will be of 7 characters. (there will be variation (atmost 8 characters where some tickers will be of 5 instead of 4 character length), but at this point, i will just ignore it since it will likely be may be about 2% of whole dataset). — Siraj S.
– Siraj S., Commented Mar 14, 2017 at 19:50
The char functions (in this case methods of chararray) just apply the corresponding string method to each element of the array. They don't speed things up much compared to an explicit loop. I'd suggest apply your own string operation in a list comprehension. — hpaulj
– hpaulj, Commented Mar 14, 2017 at 19:59

hpaulj · Accepted Answer · 2017-03-14 20:13:57Z

1

The np.char functions/methods don't speed things up much - they just loop through the elements apply the corresponding string method.

In [261]: timeit [astr.find(".NS") for astr in table.flat]
....
100000 loops, best of 3: 3.92 µs per loop
In [262]: timeit table.find(".NS")
....
100000 loops, best of 3: 11.6 µs per loop

So defining a simple function that isolates the desired substring (one of several possible routes),

def extract(astr):
    astr=astr.split('?')[1].split('&')[0]
    astr = astr.split('=')[1]
    return astr

In [268]: [extract(astr) for astr in table.flat]
Out[268]: ['TMIN.NS', '8KMS.BO', 'ADRG.NS']
In [269]: timeit [extract(astr) for astr in table.flat]
100000 loops, best of 3: 8.98 µs per loop

A general observation is that with small array/lists the list comprehension route is often faster than an equivalent array. Array operations get better with size.

edited Mar 14, 2017 at 20:13

answered Mar 14, 2017 at 20:04

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Community · Accepted Answer · 2017-05-23 12:32:05Z

0

Using a vectorized slicing method of NumPy array of string dtypes from this post -

In [149]: search_pattern = '/finance/stocks/overview?symbol='

In [150]: pruned_table = np.chararray.replace(table, search_pattern,'')

In [151]: slicer_vectorized(pruned_table, 0, 7)
Out[151]: 
array(['TMIN.NS', '8KMS.BO', 'ADRG.NS'], 
      dtype='|S7')

Alternatively, since we know the strings that we are looking for would be right after the search_pattern, we can simply look for it right after the length of that pattern, like so -

In [167]: N  = len(search_pattern)

In [168]: slicer_vectorized(table, N,N+7)
Out[168]: 
array(['TMIN.NS', '8KMS.BO', 'ADRG.NS'], 
      dtype='|S7')

edited May 23, 2017 at 12:32

CommunityBot

11 silver badge

answered Mar 14, 2017 at 19:54

Divakar

222k19 gold badges273 silver badges374 bronze badges

1 Comment

Siraj S. Over a year ago

you are simply the best.

Collectives™ on Stack Overflow

numpy char.array get set of characters from a string

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related