How does numpy allocate memory for nested array?

Question

The following program creates a large array from a nested list of arrays:

import numpy as np
a = np.arange(6).reshape(2, 3)
nested_list = [[a, a + 1], [a + 2, a + 3]]
b = np.array(nested_list)

Does np.array pre-allocate memory for only once for the result before copying data into the memory in this case?

Or, this is similar to:

c = np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])

which would pre-allocate memory for 3 times?

>>> b
array([[[[0, 1, 2],
         [3, 4, 5]],

        [[1, 2, 3],
         [4, 5, 6]]],


       [[[2, 3, 4],
         [5, 6, 7]],

        [[3, 4, 5],
         [6, 7, 8]]]])
>>> c
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])
>>> b.shape
(2, 2, 2, 3)
>>> b.reshape(2*2, 2*3)
array([[0, 1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5, 6],
       [2, 3, 4, 5, 6, 7],
       [3, 4, 5, 6, 7, 8]])

hmm... these two things are different. But I still want to know how does numpy allocate memory for b. — R zu
– R zu, Commented Nov 28, 2017 at 15:43
For the numpy 1.13.x, we can use numpy.block, which seems to pre-allocate memory for multiple times. — R zu
– R zu, Commented Nov 28, 2017 at 16:08
np.block(nested_list) creates something different, a 2d array, and is quite a bit slower. — hpaulj
– hpaulj, Commented Nov 28, 2017 at 17:42

hpaulj · Accepted Answer · 2017-11-28 18:19:32Z

nested_list = [[a, a + 1], [a + 2, a + 3]] produces 3 new arrays (the sums) plus a list of pointers to those arrays. That's just basic Python interpreter action.

b = np.array(nested_list): np.array is a complex compiled function, so without some serious digging it is hard to tell exactly what it does. My impression from previous use, and especially errors when components don't exactly match in size, is that it scans the input to determine the highest-dimensional array that it can create, and then plugs the pieces in, with type conversions if needed.

It's easy to do time comparisons; harder to track memory use. But assuming that data copying is the biggest time consumer, time tests are probably a good proxy for memory use. And unless we are hitting memory errors, we are usually more concerned with time than memory use.

In [565]: alist = [[a,a+1],[a+2,a+3]]
In [566]: allist = [[a.tolist(), (a+1).tolist()],[(a+2).tolist(), (a+3).tolist()]]

In [567]: timeit np.array(alist)
6.74 µs ± 63.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [568]: timeit np.array(allist)
9.92 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Working from the nested list of arrays is a bit faster than working from the pure list equivalent. It may be copying those arrays to the target as blocks.

Individual stacks is noticeably slower, though it also creates the a+n arrays as well:

In [569]: timeit c = np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])
37.8 µs ± 39 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

np.stack acts the same as np.array (with the default axis). It too uses concatenate:

In [570]: timeit np.stack(alist)
28.7 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Including the a+n calculations into the timing may be fairer:

In [571]: %%timeit
     ...: alist = [[a,a+1],[a+2,a+3]]
     ...: np.stack(alist)
     ...: 
38.6 µs ± 509 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [572]: %%timeit
     ...: alist = [[a,a+1],[a+2,a+3]]
     ...: np.array(alist)
     ...: 
15.7 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The new np.block was mentioned - it produces something different and is quite a bit slower

In [573]: np.block(alist)
Out[573]: 
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])
In [574]: timeit np.block(alist)
126 µs ± 2.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

block produces the same 2d array as the nested stacks:

np.vstack([np.hstack([a, a + 1]), np.hstack([a + 2, a + 3])])

np.array and np.stack produce a 4d array. It can be reshaped to 2d, but the order of elements is different. To match we'd need to do some transposing before reshaping. e.g.

In [590]: np.array(alist).transpose(0,2,1,3).reshape(4,6)
Out[590]: 
array([[0, 1, 2, 1, 2, 3],
       [3, 4, 5, 4, 5, 6],
       [2, 3, 4, 3, 4, 5],
       [5, 6, 7, 6, 7, 8]])

That np.block is about 3 times slower than the hstack+vstack combination is surprising. I never thought it would be that way.
1.14 has a faster np.block implementation, which brings the cost down to about 1.7x slower (compared against np.vstack([np.hstack(alist[0]), np.hstack(alist[1])])). Using concatenate directly is a 2.5x speed boost over hstack and vstack - for small arrays, the cost of running python functions dominates

Collectives™ on Stack Overflow

How does numpy allocate memory for nested array?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related