2

I have created a CNN model using Keras and I am training it on a MNIST dataset. I got a reasonable accuracy around 98%, which is what I expected:

model = Sequential()
model.add(Conv2D(64, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(64, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', 
    loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(data.x_train, data.y_train, 
    batch_size=256, validation_data=(data.x_test, data.y_test))

Now I want to build the same model, but using vanilla Tensorflow, here is how I did that:

X = tf.placeholder(shape=[None, 784], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")

net = tf.reshape(X, [-1, 28, 28, 1])
net = tf.layers.conv2d(
  net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
  net, filters=64, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dense(net, name="dense1", units=256, activation=tf.nn.relu)
model = tf.layers.dense(net, name="output", units=10)

And here is how I train/test it:

loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32)

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    for batch in range(data.get_number_of_train_batches(batch_size)):
        x, y = data.get_next_train_batch(batch_size)
        sess.run([loss, opt], feed_dict={X: x, Y: y})

    for batch in range(data.get_number_of_test_batches(batch_size)):
        x, y = data.get_next_test_batch(batch_size)
        sess.run(accuracy, feed_dict={X: x, Y: y})

But the resulting accuracy of the model dropped to ~80%. What are the principal differences between my implementation of that model using Keras and Tensorflow ? Why the accuracy varies so much ?

1
  • Are you importing the mnist dataset from keras? Can you maybe add that to your code to make it reproducible? Commented Feb 10, 2019 at 8:48

2 Answers 2

3
+50

I don't see any mistakes in your code. Note that your current model is heavily parameterized for such a simple problem because of the Dense layers, which introduce over 260k trainable parameters:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 24, 24, 64)        1664      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 64)          102464    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               262400    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2570      
=================================================================
Total params: 369,098
Trainable params: 369,098
Non-trainable params: 0
_________________________________________________________________

Below, I will run your code with:

  • minor adaptations to make the code work with the MNIST dataset in keras.datasets
  • a simplified model: basically I remove the 256-node Dense layer, drastically reducing the number of trainable parameters, and introduce some dropout for regularization.

With these changes, both models achieve 90%+ validation set accuracy after the first epoch. So it seems the problem you encountered has to do with an ill-posed optimization problem which leads to highly variable outcomes, and not with a bug in your code.

# Import the datasets
import numpy as np
from keras.datasets import mnist
from keras.utils import to_categorical

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Add batch dimension
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=None)
y_test = to_categorical(y_test, num_classes=None)

batch_size = 64

# Fit model using Keras
import keras
import numpy as np
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from keras.models import Sequential

model = Sequential()
model.add(Conv2D(32, 5, activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D())
model.add(Conv2D(32, 5, activation="relu"))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', 
    loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, 
    batch_size=32, validation_data=(x_test, y_test), epochs=1)

Result:

Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 35s 583us/step - loss: 1.5217 - acc: 0.8736 - val_loss: 0.0850 - val_acc: 0.9742

Note that the number of trainable parameters is now just a fraction of the amount in your model:

model.summary()
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 24, 24, 32)        832       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 12, 12, 32)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 32)          25632     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 4, 4, 32)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
=================================================================
Total params: 31,594
Trainable params: 31,594
Non-trainable params: 0

Now, doing the same with TensorFlow:

# Fit model using TensorFlow
import tensorflow as tf

X = tf.placeholder(shape=[None, 28, 28, 1], dtype=tf.float32, name="X")
Y = tf.placeholder(shape=[None, 10], dtype=tf.float32, name="Y")

net = tf.layers.conv2d(
  X, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.layers.conv2d(
  net, filters=32, kernel_size=5, padding="valid", activation=tf.nn.relu)
net = tf.layers.max_pooling2d(net, pool_size=2, strides=2)
net = tf.contrib.layers.flatten(net)
net = tf.layers.dropout(net, rate=0.25)
model = tf.layers.dense(net, name="output", units=10)

loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y, logits=model)
opt = tf.train.AdamOptimizer().minimize(loss)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(model, 1), tf.argmax(Y, 1)), tf.float32))

with tf.Session() as sess:
    tf.global_variables_initializer().run()
    L = []
    l_ = 0
    for i in range(x_train.shape[0] // batch_size):
        x, y = x_train[i*batch_size:(i+1)*batch_size],\
            y_train[i*batch_size:(i+1)*batch_size]
        l, _ = sess.run([loss, opt], feed_dict={X: x, Y: y})
        l_ += np.mean(l)
    L.append(l_ / (x_train.shape[0] // batch_size))
    print('Training loss: {:.3f}'.format(L[-1]))

    acc = []
    for j in range(x_test.shape[0] // batch_size):
        x, y = x_test[j*batch_size:(j+1)*batch_size],\
            y_test[j*batch_size:(j+1)*batch_size]
        acc.append(sess.run(accuracy, feed_dict={X: x, Y: y}))
    print('Test set accuracy: {:.3f}'.format(np.mean(acc)))

Result:

Training loss: 0.519
Test set accuracy: 0.968
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you sir, there was an issue with my implementation of get_number_of_train_batches and get_number_of_test_batches, that is why tensorflow implementation yielded so low accuracy.
Okay. I hope you'll find my answer insightful anyway!
0

Possible improvement of your models.

I used CNN networks on different problems and always got good effectiveness improvements with regularization techniques, the best ones with dropout.

I suggest to use Dropout on the Dense layers and in case with lower probability on the convolutional ones.

Also data augmentation on the input data is very important, but applicability depends on the problem domain.

P.s: in one case I had to change the optimization from Adam to SGD with Momentum. So, playing with the optimization makes sense. Also Gradient clipping can be considered when your networks starves and doesn't improve effectiveness, may be a numeric issue.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.