0

I have created a mutli-class classification neural network. Training, and validation iterators where created with BigBucketIterator method with fields {'text_normalized_tweet':TEXT, 'label': LABEL}

TEXT = a tweet LABEL = a float number (with 3 values: 0,1,2)

Below I execute a dummy example of my neural network:

import torch.nn as nn

class MultiClassClassifer(nn.Module):
  #define all the layers used in model
  def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
    
    #Constructor
    super(MultiClassClassifer, self).__init__()

    #embedding layer
    self.embedding = nn.Embedding(vocab_size, embedding_dim)

    #dense layer
    self.hiddenLayer = nn.Linear(embedding_dim, hidden_dim)

    #Batch normalization layer
    self.batchnorm = nn.BatchNorm1d(hidden_dim)

    #output layer
    self.output = nn.Linear(hidden_dim, output_dim)

    #activation layer
    self.act = nn.Softmax(dim=1) #2d-tensor

    #initialize weights of embedding layer
    self.init_weights()

  def init_weights(self):

    initrange = 1.0
    
    self.embedding.weight.data.uniform_(-initrange, initrange)
  
  def forward(self, text, text_lengths):

    embedded = self.embedding(text)

    #packed sequence
    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)

    tensor, batch_size = packed_embedded[0], packed_embedded[1]

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))

    return self.act(self.output(hidden_1))

Instantiate the model

INPUT_DIM = len(TEXT.vocab)
EMBEDDING_DIM = 100
HIDDEN_DIM = 64
OUTPUT_DIM = 3

model = MultiClassClassifer(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)

When I call

text, text_lengths = batch.text_normalized_tweet
                
predictions = model(text, text_lengths).squeeze()

loss = criterion(predictions, batch.label)

it returns,

ValueError: Expected input batch_size (416) to match target batch_size (32).

model(text, text_lengths).squeeze() = torch.Size([416, 3])
batch.label = torch.Size([32])

I can see that the two objects have different sizes, but I have no clue how to fix this?

You may find the Google Colab notebook here

Shapes of each in, out tensor of my forward() method:

torch.Size([32, 10, 100]) #self.embedding(text)
torch.Size([320, 100]) #nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
torch.Size([320, 64]) #self.batchnorm(self.hiddenLayer(tensor))
torch.Size([320, 3]) #self.act(self.output(hidden_1))
5
  • What's the dimension of model(text, text_lengths)? Why are you using squeeze()? Commented Dec 15, 2021 at 15:19
  • @kkgarg its torch.Size([416, 3]) ...i think sueeze can be omitted. I am new with PyTorch so not all keywords are familiar to me. I have posted the shapes at the end. Commented Dec 15, 2021 at 15:28
  • To debug, I'd start with noting down the dimensions after every step in the forward pass. Finally, the output of model(text, text_lengths) should be [32, 3] if your batch_size is 32 and the number of classes = 3. Try to refer to Pytorch documentation for the individual functions e.g. pytorch.org/docs/stable/generated/… Commented Dec 15, 2021 at 15:39
  • @kkgarg the hidden_1 = self.batchnorm(self.hiddenLayer(tensor)) has shape [32, 13, 3] and output = self.act(self.output(hidden_1)) has shape [32x13, 3] = [416,3] Commented Dec 15, 2021 at 15:43
  • 1
    @kkgarg please check my update with the shapes per tensor in forward() Commented Dec 15, 2021 at 15:59

1 Answer 1

1

You shouldn't be using the squeeze function after the forward pass, that doesn't make sense.

After removing the squeeze function, as you see, the shape of your final output is [320,3] whereas it is expecting [32,3]. One way to fix this is to average out the embeddings you obtain for each word after the self.Embedding function like shown below:

def forward(self, text, text_lengths):

    embedded = self.embedding(text)
    embedded = torch.mean(embedded, dim=1, keepdim=True)

    packed_embedded = nn.utils.rnn.pack_padded_sequence(embedded, text_lengths, batch_first=True)
    tensor, batch_size = packed_embedded[0], packed_embedded[1]

    hidden_1 = self.batchnorm(self.hiddenLayer(tensor))
    return self.act(self.output(hidden_1))
Sign up to request clarification or add additional context in comments.

2 Comments

Indeed averaging the embeddings makes sense and seems to fix the problem. I will accepts and upvote your answer. Moving to the next step loss.backward() I receive this error: IndexError: select(): index 1 out of range for tensor of size [1, 32, 100] at dimension 0 Shall I create a new question to that or it's related to the embedding averaging.
That should be a separate question I guess

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.