3

I am learning the neural network and implement it in python. I firstly define a softmax function, I follow the solution given by this question Softmax function - python. The following is my codes:

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    s = 0
    e = np.exp(A)
    s = e / np.sum(e, axis =0)
    return s

I was given a test codes to see if the sofmax function is correct. The test_array is the test data and test_output is the correct output for softmax(test_array). The following is the test codes:

# Test if your function works correctly.
test_array = np.array([[0.101,0.202,0.303],
                       [0.404,0.505,0.606]]) 
test_output = [[ 0.30028906,  0.33220277,  0.36750817],
               [ 0.30028906,  0.33220277,  0.36750817]]
print(np.allclose(softmax(test_array),test_output))

However according to the softmax function that I defined. Testing the data by softmax(test_array) returns

print (softmax(test_array))

[[ 0.42482427  0.42482427  0.42482427]
 [ 0.57517573  0.57517573  0.57517573]]

Could anyone indicate me what is the problem of the function softmax that I defined?

4 Answers 4

3

The problem is in your sum. You are summing in axis 0 where you should keep axis 0 untouched.

To sum over all the entries in the same example, i.e., in the same line, you have to use axis 1 instead.

def softmax(A):
    """
    Computes a softmax function. 
    Input: A (N, k) ndarray.
    Returns: (N, k) ndarray.
    """
    e = np.exp(A)
    return e / np.sum(e, axis=1, keepdims=True)

Use keepdims to preserve shape and be able to divide e by the sum.

In your example, e evaluates to:

[[ 1.10627664  1.22384801  1.35391446]
 [ 1.49780395  1.65698552  1.83308438]]

then the sum for each example (denominator in the return line) is:

[[ 3.68403911]
 [ 4.98787384]]

The function then divides each line by its sum and gives the result you have in test_output.

As MaxU pointed out, it is a good practice to remove the max before exponentiating, in order to avoid overflow:

e = np.exp(A - np.sum(A, axis=1, keepdims=True))
Sign up to request clarification or add additional context in comments.

Comments

2

Try this:

In [327]: def softmax(A):
     ...:     e = np.exp(A)
     ...:     return  e / e.sum(axis=1).reshape((-1,1))

In [328]: softmax(test_array)
Out[328]:
array([[ 0.30028906,  0.33220277,  0.36750817],
       [ 0.30028906,  0.33220277,  0.36750817]])

or better this version which will prevent overflow when large values are exponentiated:

def softmax(A):
    e = np.exp(A - np.max(A, axis=1).reshape((-1, 1)))
    return  e / e.sum(axis=1).reshape((-1,1))

Comments

2

You can print np.sum(e, axis=0) by yourself. You will see it is an array with 3 elements [ 2.60408059 2.88083353 3.18699884]. Then e / np.sum(e, axis=0) represents the 3-element array above divides every element of e(which is a 3-element array too). Apparently it is not you want.

You should change np.sum(e, axis=0) to np.sum(e, axis=1, keepdims=True), so that you will get

[[ 3.68403911]                  
 [ 4.98787384]]

instead, which is what you actually want. And you will get the right result.

And I recommand you read the rules of broadcasting in numpy. It describes how plus/subtract/multiply/divide works on two arrays with different sizes.

Comments

0

Perhaps this may be enlightening:

>>> np.sum(test_output, axis=1)
array([ 1.,  1.])

Notice that each row is normalized. In other words, they want you to compute softmax for each row independently.

1 Comment

Thanks @Mateen Ulhaq

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.