0

I want to create a NumPy array of np.ndarray from an iterable. This is because I have a function that will return np.ndarray of some constant shape, and I need to create an array of results from this function, something like this:

OUTPUT_SHAPE = some_constant

def foo(input) -> np.ndarray:
  # processing
  # generated np.ndarray of shape OUTPUT_SHAPE
  return output

inputs = [i for i in range(100000)]

iterable = (foo(input) for input in inputs)
arr = np.fromiter(iterable, np.ndarray)

This obviously gives an error:- cannot create object arrays from iterator

I cannot first create a list then convert it to an array, because it will first create a copy of every output array, so for a time, there will be almost double memory occupied, and I have very limited memory.

Can anyone help me?

1
  • There are only two ways - collect those arrays in a list, and make the array from that. Or create a zeros array of the right size, and assign individual "rows". As you found fromiter isn't a way around this, nor is repeated "concatenate" to an array. Commented Feb 18, 2022 at 22:04

2 Answers 2

1

You probably shouldn't make an object array. You should probably make an ordinary 2D array of non-object dtype. As long as you know the number of results the iterator will give in advance, you can avoid most of the copying you're worried about by doing it like this:

arr = numpy.empty((num_iterator_outputs, OUTPUT_SHAPE), dtype=whatever_appropriate_dtype)
for i, output in enumerate(iterable):
    arr[i] = output

This only needs to hold arr and a single output in memory at once, instead of arr and every output.


If you really want an object array, you can get one. The simplest way would be to go through a list, which will not perform the copying you're worried about as long as you do it right:

outputs = list(iterable)
arr = numpy.empty(len(outputs), dtype=object)
arr[:] = outputs

Note that if you just try to call numpy.array on outputs, it will try to build a 2D array, which will cause the copying you're worried about. This is true even if you specify dtype=object - it'll try to build a 2D array of object dtype, and that'll be even worse, for both usability and memory.

Sign up to request clarification or add additional context in comments.

Comments

0

An object dtype array contains references, just like a list.

Define 3 arrays:

In [589]: a,b,c = np.arange(3), np.ones(3), np.zeros(3)

put them in a list:

In [590]: alist = [a,b,c]

and in an object dtype array:

In [591]: arr = np.empty(3,object)
In [592]: arr[:] = alist
In [593]: arr
Out[593]: 
array([array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])],
      dtype=object)
In [594]: alist
Out[594]: [array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])]

Modify one, and see the change in the list and array:

In [595]: b[:] = [1,2,3]
In [596]: b
Out[596]: array([1., 2., 3.])
In [597]: alist
Out[597]: [array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])]
In [598]: arr
Out[598]: 
array([array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])],
      dtype=object)

A numeric dtype array created from these copies all values:

In [599]: arr1 = np.stack(arr)
In [600]: arr1
Out[600]: 
array([[0., 1., 2.],
       [1., 2., 3.],
       [0., 0., 0.]])

So even if your use of fromiter worked, it wouldn't be any different, memory wise from a list accumulation:

alist = []
for i in range(n):
    alist.append(constant_array)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.