I have encountered a particular problem while executing a function from the transformers library of huggingface on an Intel GPU wheel of torch. Since I am doing something I normally shouldn't be doing as Intel GPU's have minimal support for transformers it seems. I ran some very basic diagnostics and found the incompatibility that results in:
Traceback (most recent call last):
File "/home/cemc/Desktop/python.py.tar/Python Stuff/Group Policy Gradient and Relative Policy Optimization.py", line 22, in <module>
sequences = model.generate(**inputs, max_new_tokens=32)
File "/home/cemc/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2564, in generate
result = decoding_method(
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2784, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/cemc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cemc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1068, in forward
transformer_outputs = self.transformer(
File "/home/cemc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/cemc/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 873, in forward
causal_mask = create_causal_mask(
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/masking_utils.py", line 788, in create_causal_mask
early_exit, attention_mask, packed_sequence_mask, kv_length, kv_offset = _preprocess_mask_arguments(
File "/home/cemc/.local/lib/python3.10/site-packages/transformers/masking_utils.py", line 723, in _preprocess_mask_arguments
attention_mask = attention_mask.to(device=cache_position.device, dtype=torch.bool)
RuntimeError: UR error: 45 (UR_RESULT_ERROR_INVALID_ARGUMENT)
Was in fact not caused by data type incompatibility. So I thought it was because the
attention_mask = attention_mask.to(device=cache_position.device, dtype=torch.bool)
method is attempting to order my GPU to do something it can not do. Which is why I wanted to log what this function is asking my GPU to do so I can hand patch or alter what its doing. I do not know where to look. I asked ChatGPT for a solution to the exact problem in the title since I didn't want to bother real people with real jobs with this minor issue, it gave methods that do not work. I also asked some senior devs where I am interning. They told me there is a way to log CPU/GPU usage but that they didn't know how to log operations. I am running an intel iRISx^e and my operating system is Ubuntu. I am executing the code in terminal.
I checked if the error in question came from a datatype incompatibility by running:
device = torch.device("xpu:0")
dtypes = [torch.float32, torch.float16, torch.bfloat16, torch.int32, torch.bool]
for dt in dtypes:
try:
t = torch.zeros(1, dtype=dt, device=device)
print(f"{dt} -> supported")
except Exception as e:
print(f"{dt} -> not supported ({e})")
This returned
torch.float32 -> supported
torch.float16 -> supported
torch.bfloat16 -> supported
torch.int32 -> supported
torch.bool -> supported
I concluded no datatype incompatibility existed and the incompatibility in question was from operations ordered by the function.