Loading images from a directory into a numpy array

Question

I'm having trouble adding photos from a folder. Using the tensorflow documentation when downloading the training set, we get ndarray.shape in the form (60000, 28, 28). When taking photos from folders, I can't figure out how to make it that way. I would also like to mention that these photos vary in size. I would like to achieve the same shape for my ndarray as with fashion mnist (x, y, z).

def loadFiles(path):
  trainImages = []
  for r, d, f in os.walk(path):
    for file in f:
       img = cv2.imread(r + "\\" + file, cv2.IMREAD_GRAYSCALE)
       trainImages.append(img)

 trainImagesNumpy = np.ndarray(trainImages)
 return trainImagesNumpy

train = loadFiles(trainPath)

I using Tensorflow 2.1.0 and python 3.x

Thanks in advance for your help.

Nikhil Kumar · Accepted Answer · 2020-08-23 17:15:03Z

Since you say your images have different sizes, resize them as you read them from the directory, and then append them to trainImages.

I'm suggesting two options:

Option 1: Modify loadFiles as follows

def loadFiles(path):
  trainImages = []
  for r, d, f in os.walk(path):
    for file in f:
       filepath = os.path.join(r, file)
       img = cv2.imread(filepath, cv2.IMREAD_GRAYSCALE)

       # Resizing image to size (28, 28)
       img = cv2.resize(img, (28, 28), interpolation=cv2.INTER_CUBIC)

       trainImages.append(img)

 trainImagesNumpy = np.ndarray(trainImages)
 return trainImagesNumpy

train = loadFiles(trainPath)

You can use other interpolation strategies for resizing. Check out OpenCV Python documentation.

Also, using os.path.join is good practice to join base directory path and file path, as it is OS independent. It automatically takes care of the filepath separators in Windows (backslash) or Unix/Linux (forward slash).

Refer: cv2.resize

Option 2: Use the ImageDataGenerator class in keras There are two advantages to using this:

It loads data in batches.
You can perform data augmentation very easily using inbuilt parameters.

Organize your data into train, validation and test directories. Each of the directories must contain subdirectories for each of the n classes.

The directory tree will look as follows (say you are doing a binary classification of cats vs dogs):

.
├── test
│   ├── cats
│   └── dogs
├── train
│   ├── cats
│   └── dogs
└── validation
    ├── cats
    └── dogs

Then initialize a data generator, rescale the images from 0-255 to 0-1 range if you desire.

datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

Then read the training, validation and test images as batches from the flow_from_directory method.

train = datagen.flow_from_directory('data/train', target_size=(28, 28), batch_size=32)
validation = datagen.flow_from_directory('data/validation', target_size=(28, 28), batch_size=32)
test = datagen.flow_from_directory('data/test', target_size=(28, 28), batch_size=32)

Once you've executed the above code, make sure it tells you it's found the correct number of images with the correct number of classes.

You can then pass train, validation and test batches directly to the fit method in your keras model. Make sure you specify the number of steps_per_epoch and validation_steps while training. This is because generators run forever, continuously generating images, so fit needs to know when to stop. Make sure you provide the steps argument to the predict method as well, for the same reason.

Refer: Keras docs

Just a caution - cv2 reads in images as BGR as opposed to RGB. You can convert them to RGB with img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB).
@GerryP Good point to keep in mind. However, it seems the OP is using grayscale images (mentions same shape as images in Fashion MNIST).

Collectives™ on Stack Overflow

Loading images from a directory into a numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related