Why my autoencoder model is not learning?

Question

I'm trying to solve captcha dataset using autoencoder. The dataset is RGB images.

I converted the RGB images to one channel, i.e.:

(The shape of the image is (48, 200)).

So what I did next, is to use take the text of the captcha (in our case "emwpn"), and create another image, with same shape(48, 200) with this text, i.e.:

And what I tried is to feed the encoder of the autoencoder with captchas, and feed the decoder with images I created.

I didn't know if this method will be good, but I didn't expect it not to learn anything. When I tried to predict the test dataset, all I got was purple images, i.e.:

capchas_array_test_pred = conv_ae.predict(capchas_array_test)
plt.imshow(capchas_array_test_pred[1])

This means that the autoencoder predicts 0 for all the pixels of all the images.

This is the code for the conv autoencoder:

def rounded_accuracy(y_true, y_pred):
    return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))

conv_encoder = keras.models.Sequential([
    keras.layers.Reshape([48, 200, 1], input_shape=[48, 200]),
    keras.layers.Conv2D(16, kernel_size=5, padding="SAME"),
    keras.layers.BatchNormalization(),
    keras.layers.Activation("relu"),
    keras.layers.Conv2D(32, kernel_size=5, padding="SAME", activation="selu"),
    keras.layers.Conv2D(64, kernel_size=5, padding="SAME", activation="selu"),
    keras.layers.AvgPool2D(pool_size=2),
])
conv_decoder = keras.models.Sequential([
    keras.layers.Conv2DTranspose(32, kernel_size=5, strides=2, padding="SAME", activation="selu",
                                input_shape=[6, 25, 64]),
    keras.layers.Conv2DTranspose(16, kernel_size=5, strides=1, padding="SAME", activation="selu"),
    keras.layers.Conv2DTranspose(1, kernel_size=5, strides=1, padding="SAME", activation="sigmoid"),
    keras.layers.Reshape([48, 200])
])

conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])
conv_ae.compile(loss="mse", optimizer=keras.optimizers.Adam(lr=1e-1), metrics=[rounded_accuracy])
history = conv_ae.fit(capchas_array_train, capchas_array_rewritten_train, epochs=20,
                      validation_data=(capchas_array_valid, capchas_array_rewritten_valid))

The model didn't learn anything:

Epoch 2/20
24/24 [==============================] - 1s 53ms/step - loss: 60879.9883 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 3/20
24/24 [==============================] - 1s 53ms/step - loss: 60878.5781 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 4/20
24/24 [==============================] - 1s 53ms/step - loss: 60879.2656 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 5/20
24/24 [==============================] - 1s 53ms/step - loss: 60876.4648 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 6/20
24/24 [==============================] - 1s 53ms/step - loss: 60878.4883 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 7/20
24/24 [==============================] - 1s 53ms/step - loss: 60880.8242 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635

I tried to check what the check what happens if I feed the encoder and the decoder with the same images:

conv_ae.compile(loss="mse", optimizer=keras.optimizers.Adam(lr=1e-1), metrics=[rounded_accuracy])
history = conv_ae.fit(capchas_array_train, capchas_array_train, epochs=20,
                      validation_data=(capchas_array_valid, capchas_array_valid))

And again I got purple images:

P.s. If you interested, this is the notebook: https://colab.research.google.com/drive/1gA1XN1NOZKylGDhVu4PKXWhrPU4q9Ady

EDIT-

This is the preprocessing I did to the images:

1. Convert RGB image to one channel.
2. Normalize the image from value from 0 to 255 for each pixel, to 0 to 1.
3. Resize the (50, 200) image to (48, 200) - for simpler pooling in the autoencoder (48 can be divided by 2 more times, and stay integer, than 50)

This is the function for the preprocessing 1,2 steps:

def rgb2gray(rgb):
    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = (0.2989 * r + 0.5870 * g + 0.1140 * b)

    for x in range(rgb.shape[1]):
      for y in range(rgb.shape[0]):
        if gray[y][x]>128:
          gray[y][x] = 1.0
        else:
          gray[y][x] = 0.0
    return gray

The text is not in the same location, you should probably standardize that. — Jon Nordby
– Jon Nordby, Commented Apr 14, 2020 at 15:54
Your decoder is lacking downsampling between the Conv2d layers, is that intentional? — Jon Nordby
– Jon Nordby, Commented Apr 14, 2020 at 15:58
I would try first with a model with only the reshapes, to check that the learning is correctly setup. And then add a single Conv2d layer — Jon Nordby
– Jon Nordby, Commented Apr 14, 2020 at 16:01
@jonnor Yeah this is why I didn't expect it to work good, but at least do something. The encoder is also upsample the filters. — Yagel
– Yagel, Commented Apr 14, 2020 at 16:02
@jonnor Just tried only reshapes, and this give the same image as input. — Yagel
– Yagel, Commented Apr 14, 2020 at 16:04

Jose Esparza · Accepted Answer · 2020-04-15 01:57:48Z

1

Your architecture doesn't have any sense. If you want to create an autoencoder you need to understand that you're going to reverse process after encoding. That means that if you have three convolutional layers with filters in this order: 64, 32, 16; You should make the next group of convolutional layers to do the inverse: 16, 32, 64. That's the reason of why your algorithm is not learning.
You won't get the result that you have expected. You will get a similar structure of that kind of captcha but you won't that clearly text output. If you want that, you need another kind of algorithm (one that allows you to do character segmentation).

answered Apr 15, 2020 at 1:57

Jose Esparza

2911 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Yagel Over a year ago

As far as I know, the decoder is the reverse process of the encoder in my code. The decoder gets image of the shape [6, 25, 64] (image size is (6, 25) with 64 filters), then the num of filters go to 32, 16, then to 1.

Yagel Over a year ago

If you have solution, please show me what to fix in my code.

Collectives™ on Stack Overflow

Why my autoencoder model is not learning?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related