torch.manual_seed(seed) get RuntimeError: CUDA error: device-side assert triggered

Question

I am using GOOGLE COLAB when I get this error. Here is my code, I didn't find anything wrong, these code were right few hour ago but suddenly went wrong, I don't know why

import torch
if torch.cuda.is_available():       
    device = torch.device("cuda")
    print('There are %d GPU(s) available.' % torch.cuda.device_count())
    print('We will use the GPU:', torch.cuda.get_device_name(0))
else:
    print('No GPU available, using the CPU instead.')
    device = torch.device("cpu")
seed=1
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True

The error is

There are 1 GPU(s) available.
We will use the GPU: Tesla P100-PCIE-16GB
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-121-436d9d8bb120> in <module>()
      9 seed=1
     10 np.random.seed(seed)
---> 11 torch.manual_seed(seed)
     12 torch.cuda.manual_seed_all(seed)
     13 torch.backends.cudnn.deterministic = True

3 frames
/usr/local/lib/python3.7/dist-packages/torch/cuda/random.py in cb()
    109         for i in range(device_count()):
    110             default_generator = torch.cuda.default_generators[i]
--> 111             default_generator.manual_seed(seed)
    112 
    113     _lazy_call(cb, seed_all=True)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Could anyone help me?

Carlo Longhi · Accepted Answer · 2022-04-19 09:03:11Z

3

In my experience, this error may occur because of some kind of inconsistency between the number of labels in your targets and the number of classes in your model.

To solve it you can try to:

Make sure that the label in your target data starts from 0. If you have n classes in your data, your target classes should be [0, 1, 2,..., n-1]
Make sure that the model you are using is set to work with n classes

answered Apr 19, 2022 at 9:03

Carlo Longhi

545 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Aleksander Pohl · Accepted Answer · 2023-07-12 13:56:24Z

2

Well, the accepted answer seems to be very strange. The presented code does not have anything to do with any tensor manipulation - it's just the initialization of the random generator.

From my experience this error might happen, when the environment is not properly initialized. I.e. cuDNN library might not be loaded or there is some other CUDA-related problem. In my case it was a missing call to include a Conda script (source /net/software/v1/software/Miniconda3/4.9.2/etc/profile.d/conda.sh) which caused the error.

answered Jul 12, 2023 at 13:56

Aleksander Pohl

1,70510 silver badges14 bronze badges

1 Comment

Madghostek Over a year ago

The other answer probably refers to using transformers library and messing up your labels, or forgetting to specify class count for a model, for some reason the same exact error pops up, even when the environment is working.

Collectives™ on Stack Overflow

torch.manual_seed(seed) get RuntimeError: CUDA error: device-side assert triggered

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related