Pytorch complaining about input and label batch size mismatch

Question

I am using Huggingface to implement a BERT model using BertForSequenceClassification.from_pretrained().

The model is trying to predict 1 of 24 classes. I am using a batch size of 32 and a sequence length of 66.

When I try to call the model in training, I get the following error:

ValueError: Expected input batch_size (32) to match target batch_size (768).

However, my target shape is 32x24. It seems like somewhere when the model is called, this is being flattened to 768x1. Here is a test I ran to check:

for i in train_dataloader:
    i = tuple(t.to(device) for t in i)
    print(i[0].shape, i[1].shape, i[2].shape) # here i[2].shape is (32, 24)
    output = model(i[0], attention_mask=i[1], labels=i[2]) # here PyTorch complains that i[2]'s shape is now (768, 1)
    print(output.logits.shape)
    break

This outputs:

torch.Size([32, 66]) torch.Size([32, 66]) torch.Size([32, 24])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-68-c69db6168cc3> in <module>
      2     i = tuple(t.to(device) for t in i)
      3     print(i[0].shape, i[1].shape, i[2].shape)
----> 4     output = model(i[0], attention_mask=i[1], labels=i[2])
      5     print(output.logits.shape)
      6     break

4 frames
/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3024     if size_average is not None or reduce is not None:
   3025         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   3027 
   3028 

ValueError: Expected input batch_size (32) to match target batch_size (768).

Language models expect sequence inputs and outputs (they are sequence to sequence models), thus you cannot use a 2D tensor and flatting is required. — Louis Lac
– Louis Lac, Commented Dec 13, 2022 at 13:19
But the first dimension is the batch and the second dimension is the sequence — KOB
– KOB, Commented Dec 13, 2022 at 13:25
Also I am using BertForSequenceClassification so it has a classification head on top of the LM — KOB
– KOB, Commented Dec 13, 2022 at 13:33
Without seeing the particular implementation for the Bert model you're using, I suggest stepping through the model in debug mode in your choice of IDE, examining the size of the variables at each operation until you find the offending operation. This should help you very quickly converge on the expected IO format for the model — DerekG
– DerekG, Commented Dec 13, 2022 at 15:07

DerekG · Accepted Answer · 2022-12-13 15:17:25Z

2

Pytorch's implementation of CrossEntropyLoss expects targets to be integer indices, not one-hot class vectors. Thus target should be of size [batch_size], not [batch_size,n_classes].

You can ravel your classes quite simply as follows (provided each class vector is indeed one-hot):

raveler = torch.arange(0,n_classes).unsqueeze(0).expand(batch_size,n_classes)
target = (target * raveler).sum(dim = 1)

answered Dec 13, 2022 at 15:17

DerekG

3,9781 gold badge14 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

KOB Over a year ago

Great thanks! So the model will output logits and then the CrossEntropyLoss function will do all the work to compute the loss between the logits and target indices?

Collectives™ on Stack Overflow

Pytorch complaining about input and label batch size mismatch

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related