3

Following https://classroom.udacity.com/courses/ud730/lessons/6370362152/concepts/63815621490923, I'm trying to write a "softmax" function which, when given a 2-dimensional array as input, calculates the softmax of each column. I wrote the following script to test it:

import numpy as np

#scores=np.array([1.0,2.0,3.0])

scores=np.array([[1,2,3,6],
                [2,4,5,6],
                [3,8,7,6]])

def softmax(x):
    if x.ndim==1:
        S=np.sum(np.exp(x))
        return np.exp(x)/S
    elif x.ndim==2:
        result=np.zeros_like(x)
        M,N=x.shape
        for n in range(N):
            S=np.sum(np.exp(x[:,n]))
            result[:,n]=np.exp(x[:,n])/S
        return result
    else:
        print("The input array is not 1- or 2-dimensional.")

s=softmax(scores)
print(s)

However, the result "s" turns out to be an array of zeros:

[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

If I remove the "/S" in the for-loop, the 'un-normalized' result is as I would expect it to be; somehow the "/S" division appears to make all the elements zero instead dividing each element by S as I would expect it to. What is wrong with the code?

1
  • 1
    The problem is in result=np.zeros_like(x), that creates an array of integers if x is an array of integers. When you assign, term by term, the results of the normalization (all numbers are in interval 0<n<1) to an array of integers, these normalized numbers are converted to integers and hence forced to zero. On the contrary, the non normalized exponentials are just rounded toward zero. If you want to use a loop, you can instantiate result as np.zeros(x.shape), but if you'd like to: 1) not use an auxiliary array and 2) remove the loop, please have a look at my answer below. ciao Commented Apr 20, 2016 at 10:01

3 Answers 3

6

The reason for the "zeros" lies in the data type of the inputs, which are of the "int" type. Converting the input to "float" solved the problem:

import numpy as np

#scores=np.array([1.0,2.0,3.0])

scores=np.array([[1,2,3,6],
                [2,4,5,6],
                [3,8,7,6]])

def softmax(x):
    x=x.astype(float)
    if x.ndim==1:
        S=np.sum(np.exp(x))
        return np.exp(x)/S
    elif x.ndim==2:
        result=np.zeros_like(x)
        M,N=x.shape
        for n in range(N):
            S=np.sum(np.exp(x[:,n]))
            result[:,n]=np.exp(x[:,n])/S
        return result
    else:
        print("The input array is not 1- or 2-dimensional.")

s=softmax(scores)
print(s)

Note that I've added "x=x.astype(float)" to the first line of the function definition. This yields the expected output:

[[ 0.09003057  0.00242826  0.01587624  0.33333333]
 [ 0.24472847  0.01794253  0.11731043  0.33333333]
 [ 0.66524096  0.97962921  0.86681333  0.33333333]]
Sign up to request clarification or add additional context in comments.

1 Comment

The reason for the "zeros" DOES NOT lies in the data type of the inputs, it depends on how you instantiate the matrix result, using np.zeros_like(x) instead of np.zeros(x.shape)
3

The problem in your code is how you instantiate the placeholder for the results that you're about to compute, that is

    result=np.zeros_like(x)

because if x is an array of integers, also result is an array of integers and when you assign to it,

        result[:,n]=np.exp(x[:,n])/S

a conversion to integer is enforced. When you normalize dividing by S all the numbers converted to integers are in the interval (0, 1], the conversion is done truncating towards zero and so you have an array of zeros.

You said that, if you don't normalize, result is different from zero... that's because in this case you convert to integers numbers LARGER than 1.

A possible solution, that you can use in your code as is, consists in instantiating an array of float, irrispective of the type of x

    result=np.zeros(x.shape)

but I have to say that your code computes the exponential twice and uses loops where you could use vectorized operations.

Here it is a different implementation that (a) avoids loops and (b) avoids unnecessary evaluations of the exponential,

def sm(a):
    s = np.exp(a)
    if a.ndim == 1:
        return s/s.sum()
    elif a.ndim == 2:
        return s/s.sum(0) 
    else:
        return

A small test,

In [32]: sm(np.array([[1,2,3,6],
                [2,4,5,6],
                [3,8,7,6]]))
Out[32]: 
array([[ 0.09003057,  0.00242826,  0.01587624,  0.33333333],
       [ 0.24472847,  0.01794253,  0.11731043,  0.33333333],
       [ 0.66524096,  0.97962921,  0.86681333,  0.33333333]])

In [33]: 

Note that it works perfectly also with an integer array as input.

Addendum

Following the suggestion from n13 the function can be rewritten as

def sm(a):
    s = np.exp(a)
    if a.ndim <3: return s/s.sum(0) 

Thank you n13.

PS when I wrote the addendum I had not realized that n13 had posted an answer on its own...

2 Comments

Note: the if-else is not needed here, s.sum(0) - the else branch here - handles all input dimensions.
@n13 I hadn't noticed the possible nice generalization — answer edited accordingly. Thanks a lot.
2

Numpy has some nifty matrix operations that makes this problem a lot easier and simpler to solve.

Calculating the exponential works on a matrix of any dimension

the sum() method takes an argument axis which allows us to restrict the sum to a given axis - columns maps to axis 0 in our case.

def softmax(x):
    exp = np.exp(x) # exp just calculates exp for all elements in the matrix
    return exp / exp.sum(0) # sum axis = 0 argument sums over axis representing columns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.