0

In Kaggle I was given the input data folders.

#Training data
train_datagen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=10,
        zoom_range=0.4,
        horizontal_flip=True,
        validation_split=0.01
        )

train_generator = train_datagen.flow_from_directory(
        '../input/chest-xray-covid19-pneumonia/Data/train',
        target_size=(256, 256),
        batch_size=32,
        class_mode='categorical',
        subset='training'
        )

I had to add some more images to this dataset. Hence I converted my train_generator to a NumPy (nd) array using this code.

x_train=np.concatenate([train_generator.next()[0] for i in range(train_generator.__len__())])
y_train=np.concatenate([train_generator.next()[1] for i in range(train_generator.__len__())])

Thanks to this

Now I have concatenated some more images to these image array

gan_images = np.concatenate((x_train,t_x), axis=0)
gan_labels = np.concatenate((y_train,t_y), axis=0)

Now how can I again convert back it to the train_generator format?

Type of train_generator is keras.preprocessing.image.DirectoryIterator

EDIT

As per suggestion, I tried

train_dataset = train_datagen.flow(x_train,y_train)
additional_gan_dataset = train_datagen.flow(t_x,t_y)
abc = np.concatenate((gan_dataset,train_dataset), axis=0)

OOM error in kaggle;

Another way I tried

dataset = train_datagen.flow(gan_images, gan_labels)

 history1 = model1.fit(dataset,validation_data=val_generator, verbose=1, epochs= 500,
                       callbacks=[early_stopping, reduce_lr , learning_rate_reduction]
                        )

It is working, but here accuracy is coming so poor, I am sure it has not properly merged. I had a total of 5094 images. I have created another 100 images. As these are prefetch datasets I am unable to understand by checking the length. len(train_dataset) is giving me 160 After merging it is giving me 163. How to fix these? How to understand properly with these prefetch datasets?

1
  • Your code is incorrect, it should be dataset = train_datagen.flow(gan_images, gan_labels) Commented Sep 28, 2022 at 20:55

1 Answer 1

2

ImageDataGenerator.flow(x_array, y_array)

dataset = ImageDataGenerator.flow(gan_images, gan_labels)

Although, unless you need the methods of ImageDataGenerator or really need a dataset object, you can just pass the arrays to .fit().

Sign up to request clarification or add additional context in comments.

8 Comments

Hi @Djinn , I need to apply here. history = model3.fit(train_generator, validation_data=val_generator, verbose=1, epochs= 500, callbacks=[early_stopping, reduce_lr , learning_rate_reduction] ) Hence looking for this. I have applied the way you have told me. let's see.
If that's all you need, with no processing on the dataset after creating it, you still don't necessarily need to convert it to a dataset object. That's just extra overhead. But you could also add images directly to the dataset too, I believe, without needing to convert to arrays.
I tried a LOT. I generated the images using GAN. Unfortunately, I am not able to add those to the kaggle training set folder directly. Hence this approach!
Place the new images in a dataset, then use dataset.concatenate(additional_dataset)
ImageDataGenerator is not an object, it's a class, one with an object that you didn't initialize. Follow whatever guide you're following to create your datagens and try to match with the answer. It's exactly like what's in your question.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.