0

I'm trying to solve captcha dataset using autoencoder. The dataset is RGB images.

I converted the RGB images to one channel, i.e.:

enter image description here

(The shape of the image is (48, 200)).

So what I did next, is to use take the text of the captcha (in our case "emwpn"), and create another image, with same shape(48, 200) with this text, i.e.:

enter image description here

And what I tried is to feed the encoder of the autoencoder with captchas, and feed the decoder with images I created.

I didn't know if this method will be good, but I didn't expect it not to learn anything. When I tried to predict the test dataset, all I got was purple images, i.e.:

capchas_array_test_pred = conv_ae.predict(capchas_array_test)
plt.imshow(capchas_array_test_pred[1])

enter image description here

This means that the autoencoder predicts 0 for all the pixels of all the images.

This is the code for the conv autoencoder:

def rounded_accuracy(y_true, y_pred):
    return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))

conv_encoder = keras.models.Sequential([
    keras.layers.Reshape([48, 200, 1], input_shape=[48, 200]),
    keras.layers.Conv2D(16, kernel_size=5, padding="SAME"),
    keras.layers.BatchNormalization(),
    keras.layers.Activation("relu"),
    keras.layers.Conv2D(32, kernel_size=5, padding="SAME", activation="selu"),
    keras.layers.Conv2D(64, kernel_size=5, padding="SAME", activation="selu"),
    keras.layers.AvgPool2D(pool_size=2),
])
conv_decoder = keras.models.Sequential([
    keras.layers.Conv2DTranspose(32, kernel_size=5, strides=2, padding="SAME", activation="selu",
                                input_shape=[6, 25, 64]),
    keras.layers.Conv2DTranspose(16, kernel_size=5, strides=1, padding="SAME", activation="selu"),
    keras.layers.Conv2DTranspose(1, kernel_size=5, strides=1, padding="SAME", activation="sigmoid"),
    keras.layers.Reshape([48, 200])
])

conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])
conv_ae.compile(loss="mse", optimizer=keras.optimizers.Adam(lr=1e-1), metrics=[rounded_accuracy])
history = conv_ae.fit(capchas_array_train, capchas_array_rewritten_train, epochs=20,
                      validation_data=(capchas_array_valid, capchas_array_rewritten_valid))

The model didn't learn anything:

Epoch 2/20
24/24 [==============================] - 1s 53ms/step - loss: 60879.9883 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 3/20
24/24 [==============================] - 1s 53ms/step - loss: 60878.5781 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 4/20
24/24 [==============================] - 1s 53ms/step - loss: 60879.2656 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 5/20
24/24 [==============================] - 1s 53ms/step - loss: 60876.4648 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 6/20
24/24 [==============================] - 1s 53ms/step - loss: 60878.4883 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635
Epoch 7/20
24/24 [==============================] - 1s 53ms/step - loss: 60880.8242 - rounded_accuracy: 0.0637 - val_loss: 60930.7344 - val_rounded_accuracy: 0.0635

I tried to check what the check what happens if I feed the encoder and the decoder with the same images:

conv_ae.compile(loss="mse", optimizer=keras.optimizers.Adam(lr=1e-1), metrics=[rounded_accuracy])
history = conv_ae.fit(capchas_array_train, capchas_array_train, epochs=20,
                      validation_data=(capchas_array_valid, capchas_array_valid))

And again I got purple images:

enter image description here

P.s. If you interested, this is the notebook: https://colab.research.google.com/drive/1gA1XN1NOZKylGDhVu4PKXWhrPU4q9Ady


EDIT-

This is the preprocessing I did to the images:

1. Convert RGB image to one channel.
2. Normalize the image from value from 0 to 255 for each pixel, to 0 to 1.
3. Resize the (50, 200) image to (48, 200) - for simpler pooling in the autoencoder (48 can be divided by 2 more times, and stay integer, than 50)

This is the function for the preprocessing 1,2 steps:

def rgb2gray(rgb):
    r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
    gray = (0.2989 * r + 0.5870 * g + 0.1140 * b)

    for x in range(rgb.shape[1]):
      for y in range(rgb.shape[0]):
        if gray[y][x]>128:
          gray[y][x] = 1.0
        else:
          gray[y][x] = 0.0
    return gray 
12
  • The text is not in the same location, you should probably standardize that. Commented Apr 14, 2020 at 15:54
  • Your decoder is lacking downsampling between the Conv2d layers, is that intentional? Commented Apr 14, 2020 at 15:58
  • I would try first with a model with only the reshapes, to check that the learning is correctly setup. And then add a single Conv2d layer Commented Apr 14, 2020 at 16:01
  • @jonnor Yeah this is why I didn't expect it to work good, but at least do something. The encoder is also upsample the filters. Commented Apr 14, 2020 at 16:02
  • @jonnor Just tried only reshapes, and this give the same image as input. Commented Apr 14, 2020 at 16:04

1 Answer 1

1
  1. Your architecture doesn't have any sense. If you want to create an autoencoder you need to understand that you're going to reverse process after encoding. That means that if you have three convolutional layers with filters in this order: 64, 32, 16; You should make the next group of convolutional layers to do the inverse: 16, 32, 64. That's the reason of why your algorithm is not learning.
  2. You won't get the result that you have expected. You will get a similar structure of that kind of captcha but you won't that clearly text output. If you want that, you need another kind of algorithm (one that allows you to do character segmentation).
Sign up to request clarification or add additional context in comments.

2 Comments

As far as I know, the decoder is the reverse process of the encoder in my code. The decoder gets image of the shape [6, 25, 64] (image size is (6, 25) with 64 filters), then the num of filters go to 32, 16, then to 1.
If you have solution, please show me what to fix in my code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.