Python: the fastest way to translate numpy string array to a number array

Question

anyone can tell me what is the fastest way to translate this string array into a number array as below:

import numpy as np
strarray = np.array([["123456"], ["654321"]])

     to

numberarray = np.array([[1,2,3,4,5,6], [6,5,4,3,2,1]])

map str to list and then map str to int is too slow for a large array!

Please help!

Possible duplicate of How to convert an array of strings to an array of floats in numpy? — idjaw
– idjaw, Commented Feb 24, 2016 at 13:22
Are all elements guaranteed to have the same length (like it's 6 in the sample case)? — Divakar
– Divakar, Commented Feb 24, 2016 at 13:32

score 3 · Accepted Answer · 2016-02-24 17:21:23Z

3

You can split the strings into single characters with the array view method:

In [18]: strarray = np.array([[b"123456"], [b"654321"]])

In [19]: strarray.dtype
Out[19]: dtype('S6')

In [20]: strarray.view('S1')
Out[20]: 
array([['1', '2', '3', '4', '5', '6'],
       ['6', '5', '4', '3', '2', '1']], 
      dtype='|S1')

See here for data type character codes.

Then the most obvious next step is to use astype:

In [23]: strarray.view('S1').astype(int)
Out[23]: 
array([[1, 2, 3, 4, 5, 6],
       [6, 5, 4, 3, 2, 1]])

However, it's a lot faster to reinterpret (view) the memory underlying the strings as single byte integers and subtract 48. This works because ASCII characters take up a single byte and the characters '0' through '9' are binary equivalent to (u)int8's 48 through 57 (check the ord builtin).

Speed comparison:

In [26]: ar = np.array([[''.join(np.random.choice(list('123456789'), size=320))] for _ in range(1000)], bytes)

In [27]: %timeit _ = ar.view('S1').astype(np.uint8)
1 loops, best of 3: 284 ms per loop

In [28]: %timeit _ = ar.view(np.uint8) - ord('0')
1000 loops, best of 3: 1.07 ms per loop

If have Unicode instead of ASCII you need to do these steps slightly different. Or just convert to ASCII first with astype(bytes).

edited Feb 24, 2016 at 17:21

answered Feb 24, 2016 at 16:12

user2379410

Sign up to request clarification or add additional context in comments.

6 Comments

Divakar Over a year ago

Could be a version issue, I am getting unicode for strarray.dtype. I am on Python 3.4. And ar.view('S1') has "b'" all over alongwith the strings themselves.

user2379410 Over a year ago

@Divakar - I changed the strings to bytes for Python 3 compatibility.

Divakar Over a year ago

But if OP has those as strings, he/she has to convert to byte first, right? How could that be done?

user2379410 Over a year ago

@Divakar - Python 2.x has ASCII strings as default and for those it works.

Divakar Over a year ago

Ah yes you have mentioned .astype(bytes) for the conversion in the post! Nice, works for me now.

|

Divakar · Accepted Answer · 2016-02-24 16:30:25Z

0

Here's an approach that converts the input strings to N-length numeric arrays, i.e. each string gets converted to a 1D array of length N, where N is the length of each of those strings. The approach suggested here basically converts the string to their int equivalents and then gets all the digits using differentiation from their preceding elements' power-10 scaled version. The implementation looks like this -

A = (strarray.astype(int)/(10**np.arange(len(strarray[0][0])))).astype(int)
out = np.column_stack((A[:,-1],(A[:,:-1] - 10*A[:,1:])[:,::-1]))

Sample run -

In [177]: strarray  = np.array([["0308468"], ["6540542"], ["4973473"]])

In [178]: A = (strarray.astype(int)/(10**np.arange(len(strarray[0][0])))).astype(int)
     ...: out = np.column_stack((A[:,-1],(A[:,:-1] - 10*A[:,1:])[:,::-1]))
     ...: 

In [179]: out
Out[179]: 
array([[0, 3, 0, 8, 4, 6, 8],
       [6, 5, 4, 0, 5, 4, 2],
       [4, 9, 7, 3, 4, 7, 3]])

answered Feb 24, 2016 at 16:30

Divakar

222k19 gold badges273 silver badges374 bronze badges

1 Comment

zshtom Over a year ago

Tricky solution! Thanks for providing this method for lighting me up!

Collectives™ on Stack Overflow

Python: the fastest way to translate numpy string array to a number array

2 Answers 2

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related