32

I have list of lists with different lengths (e.g. [[1, 2, 3], [4, 5], [6, 7, 8, 9]]) and want to convert it into a numpy array of integers. I understand that 'sub' arrays in numpy multidimensional array must be the same length. So what is the most efficient way to convert such a list as in example above into a numpy array like this [[1, 2, 3, 0], [4, 5, 0, 0], [6, 7, 8, 9]], i.e. completed with zeros?

0

3 Answers 3

25

you could make a numpy array with np.zeros and fill them with your list elements as shown below.

a = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
import numpy as np
b = np.zeros([len(a),len(max(a,key = lambda x: len(x)))])
for i,j in enumerate(a):
    b[i][0:len(j)] = j

results in

[[ 1.  2.  3.  0.]
 [ 4.  5.  0.  0.]
 [ 6.  7.  8.  9.]]
Sign up to request clarification or add additional context in comments.

Comments

25

Do some preprocessing on the list, by padding the shorter sublists, before converting to a numpy array:

>>> lst = [[1, 2, 3], [4, 5], [1, 7, 8, 9]]
>>> pad = len(max(lst, key=len))
>>> np.array([i + [0]*(pad-len(i)) for i in lst])
array([[1, 2, 3, 0],
       [4, 5, 0, 0],
       [1, 7, 8, 9]])

3 Comments

i feel this is more pythonic than accepted answer ...
More generic solution: np.array([np.pad(i, ((0,pad-len(i)),(0,0))) for i in lst]) if lst is a list of 2 dimensional arrays. Therefore, for more dimensional arrays you need to add (0,0) for every new axis.
I like this answer more than all other answers
24

Here's a @Divakar type of answer:

In [945]: ll = [[1, 2, 3], [4, 5], [6, 7, 8, 9]]
In [946]: lens = [len(l) for l in ll]      # only iteration
In [947]: lens
Out[947]: [3, 2, 4]
In [948]: maxlen=max(lens)
In [949]: arr = np.zeros((len(ll),maxlen),int)
In [950]: mask = np.arange(maxlen) < np.array(lens)[:,None] # key line
In [951]: mask
Out[951]: 
array([[ True,  True,  True, False],
       [ True,  True, False, False],
       [ True,  True,  True,  True]], dtype=bool)
In [952]: arr[mask] = np.concatenate(ll)    # fast 1d assignment
In [953]: arr
Out[953]: 
array([[1, 2, 3, 0],
       [4, 5, 0, 0],
       [6, 7, 8, 9]])

For large lists it has the potential of being faster. But it's harder to understand and/or recreate.

Convert Python sequence to NumPy array, filling missing values - has a good post by Divakar. itertools.zip_longest is also mentioned. This could be cited as a duplicate.

2 Comments

This answer is far better than the accepted answer. Thanks!
It doesn't work for multidimensional arrays

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.