1

I am creating a Neural Network from scratch for MNIST data, so I have 10 classes in the output layer. I need to perform backpropagation and for that, I need to calculate dA*dZ for the last layer where dA is the derivative of the loss function L wrt the softmax activation function A and dZ is the derivative of the softmax activation functionA wrt to z where z=wx+b. The size obtained for dA is 10*1 whereas the size obtained for dZ is 10*10.

Is it correct? If yes, who do I multiply dA*dZ as they have different dimension.

1 Answer 1

2

You are almost there. However, you need to transpose dA, e.g. with numpy.transpose(dA). Then you will have the right dimensions of dA and dZ to perform matrix multiplication.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks btw is it the correct way to apply the softmax the softmax derivative?
Yes, it's right, but don't forget to multiply by the derivative of z with respect to w. Following link might be helpful for you: stats.stackexchange.com/questions/235528/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.