4

I am using GOOGLE COLAB when I get this error. Here is my code, I didn't find anything wrong, these code were right few hour ago but suddenly went wrong, I don't know why

import torch
if torch.cuda.is_available():       
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
seed=1
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True 

The error is

There are 1 GPU(s) available.
We will use the GPU: Tesla P100-PCIE-16GB
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-121-436d9d8bb120> in <module>()
      9 seed=1
     10 np.random.seed(seed)
---> 11 torch.manual_seed(seed)
     12 torch.cuda.manual_seed_all(seed)
     13 torch.backends.cudnn.deterministic = True

3 frames
/usr/local/lib/python3.7/dist-packages/torch/cuda/random.py in cb()
    109         for i in range(device_count()):
    110             default_generator = torch.cuda.default_generators[i]
--> 111             default_generator.manual_seed(seed)
    112 
    113     _lazy_call(cb, seed_all=True)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Could anyone help me?

2 Answers 2

3

In my experience, this error may occur because of some kind of inconsistency between the number of labels in your targets and the number of classes in your model.

To solve it you can try to:

  1. Make sure that the label in your target data starts from 0. If you have n classes in your data, your target classes should be [0, 1, 2,..., n-1]
  2. Make sure that the model you are using is set to work with n classes
Sign up to request clarification or add additional context in comments.

Comments

2

Well, the accepted answer seems to be very strange. The presented code does not have anything to do with any tensor manipulation - it's just the initialization of the random generator.

From my experience this error might happen, when the environment is not properly initialized. I.e. cuDNN library might not be loaded or there is some other CUDA-related problem. In my case it was a missing call to include a Conda script (source /net/software/v1/software/Miniconda3/4.9.2/etc/profile.d/conda.sh) which caused the error.

1 Comment

The other answer probably refers to using transformers library and messing up your labels, or forgetting to specify class count for a model, for some reason the same exact error pops up, even when the environment is working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.