1

I have a numpy array of shape holding many (200 in this example) monochromatic 64x64 pixel images, thus having the shape:

>>> a.shape
(200L, 1L, 64L, 64L)

I want to split these images in 3 new arrays, a1, a2, a3 where they will contain 80%, 10%, 10% of the images respectively, and I am doing it in the following way (I do not want them to be consecutive in a):

import numpy as np
import random

a = --read images from file--

a1 = numpy.empty((0,1,64,64))
a2 = numpy.empty((0,1,64,64))
a3 = numpy.empty((0,1,64,64))

for i in range(200): #200 is the number of images
    temp = a[-1]
    a = np.delete(a,-1,0)
    rand = random.random()
    if rand < 0.8:
        a1 = np.append(a1,[temp],0)
    elsif rand < 0.9:
        a2 = np.append(a2,[temp],0)
    else:
        a3 = np.append(a3,[temp],0)

I try to emulate pop and append which are done at O(1) time on lists, but does the same hold for numpy arrays? Is there some way to do this more efficiently (faster) for a large number (thousands) of images?

1 Answer 1

3

Here's a one-liner using np.vsplit -

a1,a2,a3 = np.vsplit(a[np.random.permutation(a.shape[0])],(160,180))

1) Shape check :

In [205]: a = np.random.rand(200,1,64,64)

In [206]: a1,a2,a3 = np.vsplit(a[np.random.permutation(a.shape[0])],(160,180))

In [207]: a.shape
Out[207]: (200, 1, 64, 64)

In [208]: a1.shape
Out[208]: (160, 1, 64, 64)

In [209]: a2.shape
Out[209]: (20, 1, 64, 64)

In [210]: a3.shape
Out[210]: (20, 1, 64, 64)

2) Value check on a toy data to make sure we are picking random images and not consecutive ones for splitting :

In [212]: a
Out[212]: 
array([[5, 8, 4],
       [7, 7, 6],
       [3, 2, 7],
       [1, 4, 8],
       [4, 1, 0],
       [2, 1, 3],
       [6, 5, 2],
       [2, 4, 5],
       [6, 6, 5],
       [5, 2, 5]])

In [213]: a1,a2,a3 = np.vsplit(a[np.random.permutation(a.shape[0])],(6,8))

In [214]: a1
Out[214]: 
array([[1, 4, 8],
       [7, 7, 6],
       [6, 6, 5],
       [2, 4, 5],
       [4, 1, 0],
       [5, 2, 5]])

In [215]: a2
Out[215]: 
array([[3, 2, 7],
       [2, 1, 3]])

In [216]: a3
Out[216]: 
array([[6, 5, 2],
       [5, 8, 4]])
Sign up to request clarification or add additional context in comments.

1 Comment

Nice, simple solution, thanks! Seems fast enough with O(n) complexity in my simple tests. Btw, I know that it was not in my original question but what about the memory? This ends up with both a and the split pieces in memory, is there a way to do this without replicating a?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.