0

I'm trying to understand this code from lightaime's Github page. It is a vetorized softmax method. What confuses me is "softmax_output[range(num_train), list(y)]"

What does this expression mean?

def softmax_loss_vectorized(W, X, y, reg):


    """
    Softmax loss function, vectorize implementation
    Inputs have dimension D, there are C classes, and we operate on minibatches of N examples.

    Inputs:
        W: A numpy array of shape (D, C) containing weights.
        X: A numpy array of shape (N, D) containing a minibatch of data.
        y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C.
        reg: (float) regularization strength

    Returns a tuple of:
        loss as single float
        gradient with respect to weights W; an array of same shape as W
    """

    # Initialize the loss and gradient to zero.
    loss = 0.0
    dW = np.zeros_like(W)


    num_classes = W.shape[1]
    num_train = X.shape[0]
    scores = X.dot(W)
    shift_scores = scores - np.max(scores, axis = 1).reshape(-1,1)
    softmax_output = np.exp(shift_scores)/np.sum(np.exp(shift_scores), axis = 1).reshape(-1,1)
    loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))   
    loss /= num_train 
    loss +=  0.5* reg * np.sum(W * W)

    dS = softmax_output.copy()
    dS[range(num_train), list(y)] += -1
    dW = (X.T).dot(dS)
    dW = dW/num_train + reg* W
    return loss, dW

2 Answers 2

1

This expression means: slice an array softmax_output of shape (N, C) extracting from it only values related to the training labels y.

Two dimensional numpy.array can be sliced with two lists containing appropriate values (i.e. they should not cause an index error)

range(num_train) creates an index for the first axis which allows to select specific values in each row with the second index - list(y). You can find it in the numpy documentation for indexing.

The first index range_num has a length equals to the first dimension of softmax_output (= N). It points to each row of the matrix; then for each row it selects target value via corresponding value from the second part of an index - list(y).

Example:

softmax_output = np.array(  # dummy values, not softmax
    [[1, 2, 3], 
     [4, 5, 6],
     [7, 8, 9],
     [10, 11, 12]]
)
num_train = 4  # length of the array
y = [2, 1, 0, 2]  # a labels; values for indexing along the second axis
softmax_output[range(num_train), list(y)]
Out:
[3, 5, 7, 12]

So, it selects third element from the first row, second from the second row etc. That's how it works.

(p.s. Do I misunderstand you and you interested in "why", not "how"?)

Sign up to request clarification or add additional context in comments.

Comments

0

The loss here is defined by following equation

enter image description here

Here, y is 1 for the class datapoint belongs and 0 for all other classes. Thus we are only interested in softmax outputs for datapoint class. Thus above equation can be rewritten as

enter image description here

Thus then following code representing above equation.

loss = -np.sum(np.log(softmax_output[range(num_train), list(y)]))

The code softmax_output[range(num_train), list(y)] is used to select softmax outputs for respective classes. range(num_train) represents all the training samples and list(y) represents respective classes.

This indexing is nicely explained Mikhail in his answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.