1

I'm trying to build a numpy array of arrays of arrays with the following code below.

Which gives me a

ValueError: setting an array element with a sequence.

My guess is that in numpy I need to declare the arrays as multi-dimensional from the beginning, but I'm not sure..

How can I fix the the code below so that I can build array of array of arrays?

from PIL import Image
import pickle
import os
import numpy

indir1 = 'PositiveResize'

trainimage = numpy.empty(2)
trainpixels = numpy.empty(80000)
trainlabels = numpy.empty(80000)
validimage = numpy.empty(2)
validpixels = numpy.empty(10000)
validlabels = numpy.empty(10000)
testimage = numpy.empty(2)
testpixels = numpy.empty(10408)
testlabels = numpy.empty(10408)

i=0
tr=0
va=0
te=0
for (root, dirs, filenames) in os.walk(indir1):
    print 'hello'
    for f in filenames:
            try:
                    im = Image.open(os.path.join(root,f))
                    Imv=im.load()
                    x,y=im.size
                    pixelv = numpy.empty(6400)
                    ind=0
                    for i in range(x):
                            for j in range(y):
                                    temp=float(Imv[j,i])
                                    temp=float(temp/255.0)
                                    pixelv[ind]=temp
                                    ind+=1
                    if i<40000:
                            trainpixels[tr]=pixelv
                            tr+=1
                    elif i<45000:
                            validpixels[va]=pixelv
                            va+=1
                    else:
                            testpixels[te]=pixelv
                            te+=1
                    print str(i)+'\t'+str(f)
                    i+=1
            except IOError:
                    continue

trainimage[0]=trainpixels
trainimage[1]=trainlabels
validimage[0]=validpixels
validimage[1]=validlabels
testimage[0]=testpixels
testimage[1]=testlabels
1
  • are the images all the same size? if so, you can pre-declare a 3d array. if not, you can declare a 1d array of type numpy.object; and these array elements you can set with a sequence; or any python object of your choosing, by definition. Commented Aug 5, 2014 at 16:48

2 Answers 2

1

Don't try to smash your entire object into a numpy array. If you have distinct things, use a numpy array for each one then use an appropriate data structure to hold them together.

For instance, if you want to do computations across images then you probably want to just store the pixels and labels in separate arrays.

trainpixels = np.empty([10000, 80, 80])
trainlabels = np.empty(10000)
for i in range(10000):
    trainpixels[i] = ...
    trainlabels[i] = ...

To access an individual image's data:

imagepixels = trainpixels[253]
imagelabel = trainlabels[253]

And you can easily do stuff like compute summary statistics over the images.

meanimage = np.mean(trainpixels, axis=0)
meanlabel = np.mean(trainlabels)

If you really want all the data to be in the same object, you should probably use a struct array as Eelco Hoogendoorn suggests. Some example usage:

# Construction and assignment
trainimages = np.empty(10000, dtype=[('label', np.int), ('pixel', np.int, (80,80))])
for i in range(10000):
    trainimages['label'][i] = ...
    trainimages['pixel'][i] = ...

# Summary statistics
meanimage = np.mean(trainimages['pixel'], axis=0)
meanlabel = np.mean(trainimages['label'])

# Accessing a single image
image = trainimages[253]
imagepixels, imagelabel = trainimages[['pixel', 'label']][253]

Alternatively, if you want to process each one separately, you could store each image's data in separate arrays and bind them together in a tuple or dictionary, then store all of that in a list.

trainimages = []
for i in range(10000):
    pixels = ...
    label = ...
    image = (pixels, label)
    trainimages.append(image)

Now to access a single images data:

imagepixels, imagelabel = trainimages[253]

This makes it more intuitive to access a single image, but because all the data is not in one big numpy array you don't get easy access to functions that work across images.

Sign up to request clarification or add additional context in comments.

2 Comments

while separate arrays may be preferable, this isn't a necessity; you can also use a struct array, ie: data = np.empty(10000, dtype=[('labels', np.int), ('pixels', np.int, (80,80))])
@EelcoHoogendoorn Good solution as well. I always forget about struct arrays.
1

Refer to the examples in numpy.empty:

>>> np.empty([2, 2])
array([[ -9.74499359e+001,   6.69583040e-309],
       [  2.13182611e-314,   3.06959433e-309]])         #random

Give your images a shape with the N dimensions:

testpixels = numpy.empty([96, 96])

5 Comments

but testpixels contains many arrays (about 10000 instances) of say [80,80] pixel matrices. How would I declare textpixels in that case?
testpixels = numpy.empty([80,80,10000])
testlabels on the other hand is just an array of integers (not array of arrays like testpixels). Then, if I were to have testimage to have testpixels and testlabels as its two elements, how should I define testimage?
You can create aliases of the views by slicing. See docs.scipy.org/doc/numpy/reference/arrays.indexing.html -- Maybe testpixels = numpy.empty([10000,80,80]); testimage = testpixels[0]
sorry.. slicing looks like it's about accessing the index in different ways, but I'm not sure how that can be applied to set testimage[0] to [80,80,10000] dimensions, while setting testimage[1] to array of integers..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.