2

I'm working on CNN model and I'm curious to know-how converts the output given by datagen.flow_from_directory() into a bumpy array. The format of datagen.flow_from_directory() is directoryiterator.

Apart from ImageDataGenerator is any other way also to fetch data from the directory.

img_width = 150
img_height = 150

datagen = ImageDataGenerator(rescale=1/255.0, validation_split=0.2)

train_data_gen =  directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='training')

vali_data_gen = datagen.flow_from_directory(directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='validation')

3 Answers 3

5

First Method:

import numpy as np    

data_gen = ImageDataGenerator(rescale = 1. / 255)

data_generator = datagen.flow_from_directory(
    data_dir,
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical')
data_list = []
batch_index = 0

while batch_index <= data_generator.batch_index:
    data = data_generator.next()
    data_list.append(data[0])
    batch_index = batch_index + 1

# now, data_array is the numeric data of whole images
data_array = np.asarray(data_list)

Alternatively, you can use PIL and numpy process the image by yourself:

from PIL import Image
import numpy as np

def image_to_array(file_path):
    img = Image.open(file_path)
    img = img.resize((img_width,img_height))
    data = np.asarray(img,dtype='float32')
    return data
    # now data is a tensor with shape(width,height,channels) of a single image

Second Method: you should use ImageDataGenerator.flow, which takes numpy arrays directly. This replaces the flow_from_directory call, all other code using the generator should be the same

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, sir, for your answer
How can you also get the corresponding labels with this method?
1

You need to use like this, is much more effective than the other methods in case of RAM usage.

img_width = 150
img_height = 150

datagen = ImageDataGenerator(rescale=1/255.0, validation_split=0.2)

train_data_gen =  directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='training')

vali_data_gen = datagen.flow_from_directory(directory='/content/xray_dataset_covid19',
                                             target_size = (img_width, img_height),
                                             class_mode='binary',
                                             batch_size=16,
                                             subset='validation')

x_train=np.concatenate([train_data_gen .next()[0] for i in range(train_data_gen .__len__())])
y_train=np.concatenate([train_data_gen .next()[1] for i in range(train_data_gen .__len__())])

x_val=np.concatenate([vali_data_gen .next()[0] for i in range(vali_data_gen .__len__())])
y_val=np.concatenate([vali_data_gen .next()[1] for i in range(vali_data_gen .__len__())])

Now you can use the x_train and y_train as an array

Comments

0

You can iterate through the generator.

def sample_from_generator(gen, nb_sample):
    cur_x, cur_y = next(gen)
    input_shape = list(cur_x.shape)[1:]
    num_classes = cur_y.shape[1]
    batch_size = len(cur_x)

    X_sample = np.zeros([nb_sample] + list(input_shape))
    Y_sample = np.zeros((nb_sample, num_classes))

    for i in range(0, nb_sample, batch_size):
        cur_x, cur_y = next(gen)
        if len(X_sample[i:i + batch_size]) < len(cur_x):
            cur_x = cur_x[:len(X_sample[i:i + batch_size])]
            cur_y = cur_y[:len(Y_sample[i:i + batch_size])]

        X_sample[i:i + batch_size] = cur_x
        Y_sample[i:i + batch_size] = cur_y
    return X_sample, Y_sample

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.