Save numpy array of Python lists

Question

I'm trying to replicate the format of an existing data file which has the following class structure when loaded with np.load:

<class 'numpy.ndarray'>
    <class 'list'>
        <class 'list'>
           <class 'numpy.str_'>

It is a ndarray with lists of lists of strings.

I'm using the following code to create the same structure, a list of lists of lists of strings and trying to convert the outermost list into a ndarray without also converting the inner lists into ndarrays.

captions = []
for row in attrs.iterrows():

    sorted_row = row[1].sort_values(ascending=False)

    attributes, variations = [], []
    for col, val in sorted_row[:20].iteritems():
        attributes.append([x[1] for x in word2Id if x[0] == col][0])
    variations.append(attributes)

    for i in range(9):
        variations.append(random.sample(attributes, len(attributes)))

    captions.append(variations)

np.save('train_captions.npy', captions)

When I open the resulting npy file, the class hierarchy is like this:

<class 'numpy.ndarray'>
    <class 'numpy.ndarray'>
        <class 'numpy.ndarray'>
           <class 'numpy.str_'>

How can I store captions in the code above so that it has the same structure as the file at the very top.

np.save can only save numpy arrays. When given the list, it first does np.array(captions). That turns the nested lists into a multidimensional array. Constructing an array of lists is tricky, especially if the lists all have the same size. Look at the array dtype and shape rather than the class hierarchy. — hpaulj
– hpaulj, Commented May 2, 2018 at 5:32

MrCabrac · Accepted Answer · 2019-01-13 17:09:12Z

2

import numpy as np

list = ["a", "b", "c", "d"]
np.save('list.npy', list)
read_list = np.load('list.npy').tolist()
print(read_list, type(read_list))

>>>['a', 'b', 'c', 'd'] <class 'list'>

If we don't use .tolist() the result is:

['a' 'b' 'c' 'd'] <class 'numpy.ndarray'>

answered Jan 13, 2019 at 17:09

MrCabrac

713 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

hpaulj · Accepted Answer · 2018-05-02 05:45:53Z

When I try to replicate your code (more or less):

In [273]: captions = []
In [274]: for r in range(2):
     ...:     attributes, variations = [], []
     ...:     for c in range(2):
     ...:         attributes.append([i for i in ['a','b','c']])
     ...:     variations.append(attributes)
     ...:     for i in range(2):
     ...:         variations.append(random.sample(attributes, len(attributes)))
     ...:     captions.append(variations)
     ...:         
In [275]: captions
Out[275]: 
[[[['a', 'b', 'c'], ['a', 'b', 'c']],
  [['a', 'b', 'c'], ['a', 'b', 'c']],
  [['a', 'b', 'c'], ['a', 'b', 'c']]],
 [[['a', 'b', 'c'], ['a', 'b', 'c']],
  [['a', 'b', 'c'], ['a', 'b', 'c']],
  [['a', 'b', 'c'], ['a', 'b', 'c']]]]

The list has several levels of nesting. When passed to np.array, the result is a 4d array of strings:

In [276]: arr = np.array(captions)
In [277]: arr.shape
Out[277]: (2, 3, 2, 3)
In [278]: arr.dtype
Out[278]: dtype('<U1')

Where possible np.array tries to make as high dimensional array as it can.

To make an array of lists, we have to do something like:

In [279]: arr = np.empty(2, dtype=object)
In [280]: arr[0] = captions[0]
In [281]: arr[1] = captions[1]
In [282]: arr
Out[282]: 
array([list([[['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']]]),
       list([[['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']], [['a', 'b', 'c'], ['a', 'b', 'c']]])],
      dtype=object)

Thanks, this worked. However, using the nested ndarrays works fine for the model I'm training anyway.

Collectives™ on Stack Overflow

Save numpy array of Python lists

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related