implementing softmax method in python

Question

I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"

What does this expression mean?

def softmax_loss_vectorized(W, X, y, reg):


    """
    Softmax loss function, vectorize implementation
    Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.

    Inputs:
        W: A numpy array of shape (D, C) containing weights.
        X: A numpy array of shape (N, D) containing a minibatch of data.
        y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
        reg: (float) regularization strength

    Returns a tuple of:
        loss as single float
        gradient with respect to weights W; an array of same shape as W
    """

    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)


    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)
    shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
    softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
    loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))   
    loss /= num_train 
    loss +=  0.5* reg * np.sum(W * W)

    dS = softmax_output.copy()
    dS[range(num_train), list(y)] += -1
    dW = (X.T).dot(dS)
    dW = dW/num_train + reg* W
    return loss, dW

Merna Mustafa · Accepted Answer · 2020-12-22 21:35:52Z

This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y.

Two dimensional numpy.array can be sliced with two lists containing appropriate values (i.e. they should not cause an index error)

range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y). You can find it in the numpy documentation for indexing.

The first index range_num has a length equals to the first dimension of softmax_output (= N). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y).

Example:

softmax_output = np.array(  # dummy values, not softmax
    [[1, 2, 3], 
     [4, 5, 6],
     [7, 8, 9],
     [10, 11, 12]]
)
num_train = 4  # length of the array
y = [2, 1, 0, 2]  # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
Out:
[3, 5, 7, 12]

So, it selects third element from the first row, second from the second row etc. That's how it works.

(p.s. Do I misunderstand you and you interested in "why", not "how"?)

Keval Dave · Accepted Answer · 2019-01-24 10:34:58Z

0

The loss here is defined by following equation

Here, y is 1 for the class datapoint belongs and 0 for all other classes. Thus we are only interested in softmax outputs for datapoint class. Thus above equation can be rewritten as

Thus then following code representing above equation.

loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))

The code softmax_output[range(num_train), list(y)] is used to select softmax outputs for respective classes. range(num_train) represents all the training samples and list(y) represents respective classes.

This indexing is nicely explained Mikhail in his answer.

answered Jan 24, 2019 at 10:34

Keval Dave

3,0471 gold badge15 silver badges16 bronze badges

Collectives™ on Stack Overflow

implementing softmax method in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related