Recently,I upgraded my Ubuntu from 22.04 to 24.04 and found the performence of my trained deep network written by torch degrade. After debug, I found the problem is copying data from one gpu to another. The test code:
a=np.ones([2,64,1500,800]).astype('float32')
a=torch.from_numpy(a)
b=a.to('cuda:0').to('cuda:1')
c=a.to('cuda:0').cpu().to('cuda:1')
print(torch.min(b)) ### 0
print(torch.min(c)) ### 1
The value of b is very strange. Some values of entries in b are 0 and the others are 1.
Can someone give the reasons?