13

This example is taken verbatim from the PyTorch Documentation. Now I do have some background on Deep Learning in general and know that it should be obvious that the forward call represents a forward pass, passing through different layers and finally reaching the end, with 10 outputs in this case, then you take the output of the forward pass and compute the loss using the loss function one defined. Now, I forgot what exactly the output from the forward() pass yields me in this scenario.

I thought that the last layer in a Neural Network should be some sort of activation function like sigmoid() or softmax(), but I did not see these being defined anywhere, furthermore, when I was doing a project now, I found out that softmax() is called later on. So I just want to clarify what exactly is the outputs = net(inputs) giving me, from this link, it seems to me by default the output of a PyTorch model's forward pass is logits?

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        print(outputs)
        break
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')
2
  • 1
    There is no such thing as default output of a forward function in PyTorch. Commented Nov 24, 2020 at 15:21
  • 1
    When no layer with nonlinearity is added at the end of the network, then basically the output is a real valued scalar, vector or tensor. Commented Nov 24, 2020 at 22:54

1 Answer 1

12

it seems to me by default the output of a PyTorch model's forward pass is logits

As I can see from the forward pass, yes, your function is passing the raw output

def forward(self, x):
  x = self.pool(F.relu(self.conv1(x)))
  x = self.pool(F.relu(self.conv2(x)))
  x = x.view(-1, 16 * 5 * 5)
  x = F.relu(self.fc1(x))
  x = F.relu(self.fc2(x))
  x = self.fc3(x)
  return x

So, where is softmax? Right here:

criterion = nn.CrossEntropyLoss()

It's a bit masked, but inside this function is handled the softmax computation which, of course, works with the raw output of your last layer

This is softmax calculation:

softmax

where z_i are the raw outputs of the neural network

So, in conclusion, there is no activation function in your last input because it's handled by the nn.CrossEntropyLoss class

Answering what's the raw output that comes from nn.Linear: The raw output of a neural network layer is the linear combination of the values that come from the neurons of the previous layer

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you! So to say, that if my previous of the linear layer (last layer) has 20 neurons/output values, and my linear layer has 5 outputs/classes, I can expect the output of the linear layer to be an array with 5 values, each of which is the linear combination of the 20 values multiplied by the 20 weights + bias?
@ilovewt yes, that's correct. Then the raw output is combined in the loss with softmax to output probabilities
What I did for me to find the softmax predictions is something like: softmax_preds = torch.nn.Softmax(dim=1)(input=raw_outputs ).to('cpu').detach().numpy(). Because even though nn.CrossEntropyLoss() does incorporate softmax inside, all it does is give me the loss when I call loss = criterion(raw_outputs, labels). Is this right?
@ilovewt yes it is correct. Anyway, I suggest you to open a new question if you have any new problem/implementation issues that you didn't understand from the doc ( pytorch is very well documented :) pytorch.org/docs/stable/generated/torch.nn.Softmax.html, pytorch.org/tutorials/beginner/nlp/deep_learning_tutorial.html). This is because it's better to be on topic for your current question
feel free to tag me. Unfortunately I am not so expert of pytorch (I know better keras\tf :)). If I know the answer I'll help
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.