How convert Keras ImageDataGenerator into Numpy Array?

Question

I'm working on CNN model and I'm curious to know-how converts the output given by datagen.flow_from_directory() into a bumpy array. The format of datagen.flow_from_directory() is directoryiterator.

Apart from ImageDataGenerator is any other way also to fetch data from the directory.

img_width = 150
img_height = 150

datagen = ImageDataGenerator(rescale=1/255.0, validation_split=0.2)

train_data_gen =  directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='training')

vali_data_gen = datagen.flow_from_directory(directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='validation')

bsquare · Accepted Answer · 2020-04-08 17:19:05Z

5

First Method:

import numpy as np    

data_gen = ImageDataGenerator(rescale = 1. / 255)

data_generator = datagen.flow_from_directory(
    data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')
data_list = []
batch_index = 0

while batch_index <= data_generator.batch_index:
    data = data_generator.next()
    data_list.append(data[0])
    batch_index = batch_index + 1

# now, data_array is the numeric data of whole images
data_array = np.asarray(data_list)

Alternatively, you can use PIL and numpy process the image by yourself:

from PIL import Image
import numpy as np

def image_to_array(file_path):
    img = Image.open(file_path)
    img = img.resize((img_width,img_height))
    data = np.asarray(img,dtype='float32')
    return data
    # now data is a tensor with shape(width,height,channels) of a single image

Second Method: you should use ImageDataGenerator.flow, which takes numpy arrays directly. This replaces the flow_from_directory call, all other code using the generator should be the same

answered Apr 8, 2020 at 17:19

bsquare

9966 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Navin Over a year ago

Thank you, sir, for your answer

DragonflyRobotics Over a year ago

How can you also get the corresponding labels with this method?

mcagriaksoy · Accepted Answer · 2022-07-13 07:49:43Z

You need to use like this, is much more effective than the other methods in case of RAM usage.

img_width = 150
img_height = 150

datagen = ImageDataGenerator(rescale=1/255.0, validation_split=0.2)

train_data_gen =  directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='training')

vali_data_gen = datagen.flow_from_directory(directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='validation')

x_train=np.concatenate([train_data_gen .next()[0] for i in range(train_data_gen .__len__())])
y_train=np.concatenate([train_data_gen .next()[1] for i in range(train_data_gen .__len__())])

x_val=np.concatenate([vali_data_gen .next()[0] for i in range(vali_data_gen .__len__())])
y_val=np.concatenate([vali_data_gen .next()[1] for i in range(vali_data_gen .__len__())])

Now you can use the x_train and y_train as an array

Shawn · Accepted Answer · 2021-07-30 03:22:30Z

0

You can iterate through the generator.

def sample_from_generator(gen, nb_sample):
    cur_x, cur_y = next(gen)
    input_shape = list(cur_x.shape)[1:]
    num_classes = cur_y.shape[1]
    batch_size = len(cur_x)

    X_sample = np.zeros([nb_sample] + list(input_shape))
    Y_sample = np.zeros((nb_sample, num_classes))

    for i in range(0, nb_sample, batch_size):
        cur_x, cur_y = next(gen)
        if len(X_sample[i:i + batch_size]) < len(cur_x):
            cur_x = cur_x[:len(X_sample[i:i + batch_size])]
            cur_y = cur_y[:len(Y_sample[i:i + batch_size])]

        X_sample[i:i + batch_size] = cur_x
        Y_sample[i:i + batch_size] = cur_y
    return X_sample, Y_sample

answered Jul 30, 2021 at 3:22

Shawn

6119 silver badges8 bronze badges

Collectives™ on Stack Overflow

How convert Keras ImageDataGenerator into Numpy Array?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related