Representing a list of strings as a numpy array of their ascii codes

Question

What I have is a list of strings. What I would like to do with it is to convert it to a 2D numpy array, where result[i, j] would be the ascii code of j-th character from i-th string (preferably as float).

I know I can use list(map(float, map(ord, single_line_from_list))) to get a list of my floats, convert it to 1D array, and then loop all of that to get my final array. But I wonder if there's a more elegant way to do this.

Is there a particular reason you're using a list of str instead of an ndarray with one of numpy's string types? — o11c
– o11c, Commented Aug 30, 2017 at 1:30
Also, I'm not sure what you think you gain by having dtype=float when all the values fit in dtype=uint8, which is much less storage and the values usually convert as needed. — o11c
– o11c, Commented Aug 30, 2017 at 1:32

stamaimer · Accepted Answer · 2017-08-30 01:23:25Z

2

You can use nested list comprehension.

import numpy as np 

array = np.array([[float(ord(character)) for character in word] for word in words])

answered Aug 30, 2017 at 1:23

stamaimer

6,5456 gold badges39 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

o11c Over a year ago

Pre-building an ndarray and then filling it will avoid the temporaries.

Lugi Over a year ago

This doesn't actually return a 2D array, just an array of lists. Any idea on how to fix that (just pad columns that are shorter than the max lenght with zeros).

akuiper · Accepted Answer · 2017-08-30 01:37:02Z

One option could be create a sparse matrix using scipy.sparse.coo_matrix and then convert it to dense:

from scipy.sparse import coo_matrix

lst = ['hello', 'world!!']

idx, idy, val = zip(*((i, j, ord(c)) for i, s in enumerate(lst) for j, c in enumerate(s)))   
coo_matrix((val, (idx, idy)), shape=(max(idx)+1, max(idy)+1)).todense()

#matrix([[104, 101, 108, 108, 111,   0,   0],
#        [119, 111, 114, 108, 100,  33,  33]])

Or use izip_longest(python2)/zip_longest(python3) from itertools:

from itertools import izip_longest

list(zip(*izip_longest(*map(lambda s: map(ord, s), lst))))
# [(104, 101, 108, 108, 111, None, None), (119, 111, 114, 108, 100, 33, 33)]

This gives a 2d list. You can use fillvalue parameter to fill the Nones:

list(zip(*izip_longest(*map(lambda s: map(ord, s), lst), fillvalue=0)))
# [(104, 101, 108, 108, 111, 0, 0), (119, 111, 114, 108, 100, 33, 33)]

Collectives™ on Stack Overflow

Representing a list of strings as a numpy array of their ascii codes

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related