Convert a numpy array of strings to a numpy matrix

Question

So I have a numpy array of strings that contain numeric values separated by spaces, for example:

np.array(['1 2', '3 4'])
array(['1 2', '3 4'], dtype='<U3')

and I want to convert it to a numerical matrix like:

np.array([[1,2],[3,4]])
array([[1, 2],[3, 4]])

I'm looking for an operation that can leverage numpy vecotrized operations, as speed is important here. The rows have length 2 in this example, but I need a general approach with an arbitrary row length.

Thanks!

Possible duplicate of Convert string numpy.ndarray to float numpy.ndarray — Georgy
– Georgy, Commented Jul 15, 2019 at 10:03
I came up with two vectorized solutions using np.char.split and pd.Series.str.split but both of them are slower than the native Python loops in the accepted answer of the duplicate target. — Georgy
– Georgy, Commented Jul 15, 2019 at 11:01
@Georgy Can you post these solutions, maybe they are faster with bigger arrays, which is my case — Msegade
– Msegade, Commented Jul 15, 2019 at 11:18
@Georgy Thanks! I think the solution with np.char.split should be the fastest, I posted an issue in the numpy tracker — Msegade
– Msegade, Commented Jul 15, 2019 at 15:25

Paul Panzer · Accepted Answer · 2019-07-15 14:52:04Z

1

Here is an approach assuming nonnegative ints coming in pairs of two separated by a single space:

def to_num(x):                                          
    y = (x[:,None].view(np.int32)-48)*10**np.arange(x.itemsize//4-1,-1,-1)                    
    splt = y.argmin(1)                                                                        
    z = np.take_along_axis(y.cumsum(1),np.column_stack([splt-1,np.full(*y.shape-np.arange(2))]),1)
    z[:,1]+=10**(y.shape[1]-splt-1)*16-z[:,0]                                                    
    z[:,0]//=10**(y.shape[1]-splt)                                                               
    end = (y[:,::-1]>=0).argmax(1)
    z[:,1]+=np.concatenate([[0],48*np.cumsum(10**np.arange(end.max()))])[end]
    z[:,1]//=10**end
    return z

For example, 10^6 pairs take roughly 3 secs on my machine:

from timeit import timeit

x = np.random.randint(0,1000000,(1000000,2))
x = np.array([" ".join(map(str, y)) for y in x.tolist()])

(to_num(x) == [[int(z) for z in y.split()] for y in x.tolist()]).all()
# True
timeit(lambda:to_num(x), number=10)
# 2.9360161621589214

answered Jul 15, 2019 at 14:52

Paul Panzer

53.3k3 gold badges59 silver badges103 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Msegade Over a year ago

This seems to work, but it assumes that each row has length 2, and I'm interested in arrays with bigger rows, I'll add the info to the question.

some_name.py · Accepted Answer · 2019-07-15 09:50:25Z

0

If it dont have to be that fast you could iterate element-wise over it and then apply:

def seperate_sting(s):

    split_numbers = s.split(' ')
    output = np.asarray(split_numbers).astype(int)

    return output


seperate_sting('1 1')
>>> array([1, 1])

answered Jul 15, 2019 at 9:50

some_name.py

8377 silver badges17 bronze badges

3 Comments

Msegade Over a year ago

Thanks for the answer, but speed is critical here, I was looking for some type of vectorized operation. I will add the info to the question.

some_name.py Over a year ago

are there int values only? And are they resticted to [0,9]?

Msegade Over a year ago

Yes, they are integers, but they are not restricted to [0,9]

Hugo Chittaro · Accepted Answer · 2019-07-15 09:58:08Z

0

First, try to split your string with the white space, and when it's done check for the function numpy.asmatrix()

answered Jul 15, 2019 at 9:58

Hugo Chittaro

13111 bronze badges

1 Comment

hpaulj Over a year ago

np.matrix(';'.join(a)) uses the ''1 2; 3 4" syntax that np.matrix accepts. But this is slower than the list comprehensions. np.matrix still has to use string operations to split the lines and numbers, just reversing our join. It doesn't use fast compiled code to do that.

Collectives™ on Stack Overflow

Convert a numpy array of strings to a numpy matrix

3 Answers 3

1 Comment

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related