I'm building a neural network from scratch using only Python and numpy, It's meant for classifying the MNIST data set, I got everything to work but the network isn't really learning, at epoch 0 it's accuracy is about 12% after 20 epochs, it increases to 14% but then gradually drops to back to around 12% after 40 epochs. So, it's clear that there's something wrong with my Backpropagation (And yes, I tried increasing epochs to 150 but I still get the same results).
I actually followed this video, But I handled dimensions in a different way, which lead to the code being different, He made it so that the rows are the features while the columns are the samples, But I did the opposite, So while backpropagating I had to transpose some arrays to make his algorithm compatible (I think this might be the reason why it's not working).
Loading the data:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255, x_test / 255
x_train, x_test = x_train.reshape(len(x_train), 28 * 28), x_test.reshape(len(x_test), 28 * 28)
print(x_train.shape) # (60000, 784)
print(x_test.shape) # (10000, 784)
Here's the meat of the model:
W1 = np.random.randn(784, 10)
b1 = np.random.randn(10)
W2 = np.random.randn(10, 10)
b2 = np.random.randn(10)
def relu(x, dir=False):
if dir: return x > 0
return np.maximum(x, 0)
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=1, keepdims = True)
def one_hot_encode(y):
y_hot = np.zeros(shape=(len(y), 10))
for i in range(len(y)):
y_hot[i][y[i]] = 1
return y_hot
def loss_function(predictions, true):
return predictions - true
def predict(x):
Z1 = x.dot(W1) + b1
A1 = relu(Z1)
Z2 = A1.dot(W2) + b2
A2 = softmax(Z2)
# The final prediction is A2 at index 3 or -1:
return Z1, A1, Z2, A2
def get_accuracy(predictions, Y):
guesses = predictions.argmax(axis=1)
average = 0
i = 0
while i < len(guesses):
if guesses[i] == Y[i]:
average += 1
i += 1
percent = (average / len(guesses)) * 100
return percent
def train(data, labels, epochs=40, learning_rate=0.1):
for i in range(epochs):
labels_one_hot = one_hot_encode(labels)
# Forward:
m = len(labels_one_hot)
Z1, A1, Z2, A2 = predict(data)
# I think the error is in this chunk:
# backwards pass:
dZ2 = A2 - labels_one_hot
dW2 = 1 / m * dZ2.T.dot(A1)
db2 = 1 / m * np.sum(dZ2, axis=1)
dZ1 = W2.dot(dZ2.T).T * relu(Z1, dir=True)
dW1 = 1 / m * dZ1.T.dot(data)
db1 = 1 / m * np.sum(dZ1)
# Update parameters:
update(learning_rate, dW1, db1, dW2, db2)
print("Iteration: ", i + 1)
predictions = predict(data)[-1] # item at -1 is the final prediction.
print(get_accuracy(predictions, labels))
def update(learning_rate, dW1, db1, dW2, db2):
global W1, b1, W2, b2
W1 = W1 - learning_rate * dW1.T # I have to transpose it here.
b1 = b1 - learning_rate * db1
W2 = W2 - learning_rate * dW2
b2 = b2 - learning_rate * db2
train(x_train, y_train)
predictions = predict(x_test)[-1]
print(get_accuracy(predictions, y_test)) # The result is about 11.5% accuracy.