How can I put many 2D numpy arrays fast in a 4D numpy array?

Question

I have about 150,000 images which I want to load in a numpy array of shape [index][y][x][channel]. Currently, I do it like this:

images = numpy.zeros((len(data), 32, 32, 1))
for i, fname in enumerate(data):
    img = scipy.ndimage.imread(fname, flatten=False, mode='L')
    img = img.reshape((1, img.shape[0], img.shape[1], 1))
    for y in range(32):
        for x in range(32):
            images[i][y][x][0] = img[0][y][x][0]

This works, but I think there must be a better solution than iterating over the elements. I could get rid of the reshaping, but this would still leave the two nested for-loops.

What is the fastest way to achive the same images 4D array, having 150,000 images which need to be loaded into it?

MSeifert · Accepted Answer · 2017-01-18 00:25:31Z

2

Generally you don't need to copy single elements when dealing with numpy-arrays. You can just specify the axis (if they are equal sized or broadcastable) you want to copy your array to and/or from:

images[i,:,:,0] = img[0,:,:,0]

instead of your loops. In fact you don't need the reshape at all:

images[i,:,:,0] = scipy.ndimage.imread(fname, flatten=False, mode='L')

These : specify that you want these axis to be preserved (not sliced) and numpy supports array to array assignments, for example:

>>> a = np.zeros((3,3,3))
>>> a[0, :, :] = np.ones((3, 3))
>>> a
array([[[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]]])

or

>>> a = np.zeros((3,3,3))
>>> a[:, 1, :] = np.ones((3, 3))
>>> a
array([[[ 0.,  0.,  0.],
        [ 1.,  1.,  1.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 1.,  1.,  1.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 1.,  1.,  1.],
        [ 0.,  0.,  0.]]])

edited Jan 18, 2017 at 0:25

answered Jan 18, 2017 at 0:00

MSeifert

154k41 gold badges356 silver badges377 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Eric Over a year ago

Not sure that that'll work with flatten=False, because (x, y, 1) won't broadcast to (x, y). Isn't images[i] = scipy.ndimage.imread(fname, flatten=False, mode='L') enough anyway?

MSeifert Over a year ago

@Eric Good question, I assumed that mode='L' just defines the "bits" of the greyscale (8-bit) but will return a 2D array.

hpaulj · Accepted Answer · 2017-01-18 03:16:23Z

0

Essentially there are 2 approaches

res = np.zeros((<correct shape>), dtype)
for i in range(...):
   img = <load>
   <reshape if needed>
   res[i,...] = img

If you've chosen the initial shape of res correctly you should be able copy each image array into its slot without loop or much reshaping.

The other approach uses list append

alist = []
for _ in range(...):
   img = <load>
   <reshape>
   alist.append(img)
res = np.array(alist)

this collects all component arrays into a list, and uses np.array to join them into one array with a new dimension at the start. np.stack gives a little more power in selecting the concatenation axis.

answered Jan 18, 2017 at 3:16

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Collectives™ on Stack Overflow

How can I put many 2D numpy arrays fast in a 4D numpy array?

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related