2

using keras ImageDataGenerator, we can save augmented images as png or jpg :

    for X_batch, y_batch in datagen.flow(train_data, train_labels, batch_size=batch_size,\
                save_to_dir='images', save_prefix='aug', save_format='png'):

I have a dataset of the shape (1600, 4, 100,100), which means 1600 images with 4 channels of 100x100 pixels. How can I save the augmented data as numpy array of shape (N,4,100,100) instead of individual images?

3
  • You want to save each batch in a file? Like np.save('batch.npy', X_batch) ? Commented Aug 10, 2017 at 13:52
  • i want to save all of the augmented data in one file. Commented Aug 10, 2017 at 14:02
  • 1
    You can't. Read the documentation: flow(x, y): Takes numpy data & label arrays, and generates batches of augmented/normalized data. Yields batches **indefinitely, in an infinite loop**.. Although, you could probably exact only the first M batches and join them together. Commented Aug 10, 2017 at 14:33

1 Answer 1

3

Since you know the number of samples = 1600, you can stop datagen.flow() as long as this number is reached.

augmented_data = []
num_augmented = 0
for X_batch, y_batch in datagen.flow(train_data, train_labels, batch_size=batch_size, shuffle=False):
    augmented_data.append(X_batch)
    num_augmented += batch_size
    if num_augmented == train_data.shape[0]:
        break
augmented_data = np.concatenate(augmented_data)
np.save(...)

Note that you should set batch_size properly (e.g. batch_size=10) so that no extra augmented images are generated.

Sign up to request clarification or add additional context in comments.

2 Comments

So I can keep 'num_augmented' as large as I want? Will that be a right approach to create a large training dataset from less data?
Umm...from my own experience, I feel like there's not too much information you could "squeeze out of" your data by image augmentation. Image augmentation makes your model more robust by applying small random transformations on the data. If you distort the image too much, your model may learn some undesirable patterns from it. The effect of a large num_augmented is more like running through the same dataset for several epochs, rather than a several-times-larger dataset.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.