Convert list of lists with different lengths to a numpy array [duplicate]

Question

I have list of lists with different lengths (e.g. [[1, 2, 3], [4, 5], [6, 7, 8, 9]]) and want to convert it into a numpy array of integers. I understand that 'sub' arrays in numpy multidimensional array must be the same length. So what is the most efficient way to convert such a list as in example above into a numpy array like this [[1, 2, 3, 0], [4, 5, 0, 0], [6, 7, 8, 9]], i.e. completed with zeros?

plasmon360 · Accepted Answer · 2017-03-31 17:19:39Z

25

you could make a numpy array with np.zeros and fill them with your list elements as shown below.

a = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
import numpy as np
b = np.zeros([len(a),len(max(a,key = lambda x: len(x)))])
for i,j in enumerate(a):
    b[i][0:len(j)] = j

results in

[[ 1.  2.  3.  0.]
 [ 4.  5.  0.  0.]
 [ 6.  7.  8.  9.]]

answered Mar 31, 2017 at 17:19

plasmon360

4,1991 gold badge21 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Moses Koledoye · Accepted Answer · 2017-03-31 17:20:51Z

25

Do some preprocessing on the list, by padding the shorter sublists, before converting to a numpy array:

>>> lst = [[1, 2, 3], [4, 5], [1, 7, 8, 9]]
>>> pad = len(max(lst, key=len))
>>> np.array([i + [0]*(pad-len(i)) for i in lst])
array([[1, 2, 3, 0],
       [4, 5, 0, 0],
       [1, 7, 8, 9]])

answered Mar 31, 2017 at 17:20

Moses Koledoye

78.8k8 gold badges139 silver badges141 bronze badges

3 Comments

shantanu pathak Over a year ago

i feel this is more pythonic than accepted answer ...

Sergey Over a year ago

More generic solution: np.array([np.pad(i, ((0,pad-len(i)),(0,0))) for i in lst]) if lst is a list of 2 dimensional arrays. Therefore, for more dimensional arrays you need to add (0,0) for every new axis.

Amr ALHOSSARY Over a year ago

I like this answer more than all other answers

Community · Accepted Answer · 2017-05-23 12:17:30Z

24

Here's a @Divakar type of answer:

In [945]: ll = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
In [946]: lens = [len(l) for l in ll]      # only iteration
In [947]: lens
Out[947]: [3, 2, 4]
In [948]: maxlen=max(lens)
In [949]: arr = np.zeros((len(ll),maxlen),int)
In [950]: mask = np.arange(maxlen) < np.array(lens)[:,None] # key line
In [951]: mask
Out[951]: 
array([[ True,  True,  True, False],
       [ True,  True, False, False],
       [ True,  True,  True,  True]], dtype=bool)
In [952]: arr[mask] = np.concatenate(ll)    # fast 1d assignment
In [953]: arr
Out[953]: 
array([[1, 2, 3, 0],
       [4, 5, 0, 0],
       [6, 7, 8, 9]])

For large lists it has the potential of being faster. But it's harder to understand and/or recreate.

Convert Python sequence to NumPy array, filling missing values - has a good post by Divakar. itertools.zip_longest is also mentioned. This could be cited as a duplicate.

edited May 23, 2017 at 12:17

CommunityBot

11 silver badge

answered Mar 31, 2017 at 20:38

hpaulj

233k14 gold badges260 silver badges392 bronze badges

2 Comments

Emma Strubell Over a year ago

This answer is far better than the accepted answer. Thanks!

Sergey Over a year ago

It doesn't work for multidimensional arrays

Collectives™ on Stack Overflow

Convert list of lists with different lengths to a numpy array [duplicate]

3 Answers 3

Comments

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

2 Comments

Linked

Related