0

I have a list of lists (of variable len) that needs to be converted into a numpy array. Example:

import numpy as np

sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
sample_arr = np.asarray(sample_list)

>>> sample_arr
array([list(['hello', 'world']), list(['foo']),
       list(['alpha', 'beta', 'gamma']), list([])], dtype=object)

>>> sample_arr.shape
(4,)

In the above example, I got a single-dimensional array which is desired. The downstream modules of the code expect the same. However when the lists have the same length, it outputting a 2-dimensional array resulting in error in downstream modules of my code:

sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
sample_arr = np.asarray(sample_list)

>>>
>>> sample_arr
array([['hello'],
       ['world'],
       ['foo'],
       ['bar']], dtype='<U5')
>>> sample_arr.shape
(4, 1)

Instead, I wanted the output similar to the first example:

>>> sample_arr
array([list(['hello']), list(['world']),
       list(['foo']), list(['bar'])], dtype=object)

Is there any way I can achieve that?

1

3 Answers 3

2

In your first case, np.array gives us a warning (in new enough numpy versions). That should tell us something - using np.array to make ragged arrays is not ideal. np.array is meant to create regular multidimensional arrays, with numeric (or string) dtypes. Creating an object dtype array like this a fallback option.

In [96]: sample_list = [["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]
In [97]: arr = np.array(sample_list)
<ipython-input-97-ec7d58f98892>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  arr = np.array(sample_list)
In [98]: arr
Out[98]: 
array([list(['hello', 'world']), list(['foo']),
       list(['alpha', 'beta', 'gamma']), list([])], dtype=object)

In many ways such an array is a debased list, not a true array.

In the second case it can work as intended (by the developers, if not you!):

In [99]: sample_list = [["hello"], ["world"], ["foo"], ["bar"]]
In [100]: arr = np.array(sample_list)
In [101]: arr
Out[101]: 
array([['hello'],
       ['world'],
       ['foo'],
       ['bar']], dtype='<U5')

To work around that, I recommend making an object dtype array of the right size, and populating it from the list:

In [102]: arr = np.empty(len(sample_list), object)
In [103]: arr
Out[103]: array([None, None, None, None], dtype=object)
In [104]: arr[:] = sample_list
In [105]: arr
Out[105]: 
array([list(['hello']), list(['world']), list(['foo']), list(['bar'])],
      dtype=object)
Sign up to request clarification or add additional context in comments.

Comments

2

Yes, it's possible! You can define a function that converts the list of lists into a single list that contains all items as follows.

import numpy as np
def flatten_list(nested_list):
    single_list = []
    for item in nested_list:
        single_list.extend(item)
    return single_list

sample_arr = np.asarray(flatten_list([["hello", "world"], ["foo"], ["alpha", "beta", "gamma"], []]))
print(sample_arr)

Comments

0

A quick and dirty Pythonic approach you can use a list comprehension :

sample_arr = np.asarray([[j] for sub in sample_list for j in sub]) 

A little more info on list comprehensions if you're interested: https://www.w3schools.com/python/python_lists_comprehension.asp

2 Comments

Output is an array of strings, ['hello' 'world' 'foo' 'bar'], not what OP expects.
This new comprehension is simply equal to sample_list 😅

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.