1

I have deployed a trained PyTorch model to a Google Vertex AI Prediction endpoint. The endpoint is working fine, giving me predictions, but when I examine its logs in Logs Explorer, I see:

INFO 2023-01-11T10:34:53.270885171Z Number of GPUs: 0

INFO 2023-01-11T10:34:53.270888834Z Number of CPUs: 4

This is despite the fact that I set the endpoint to use NVIDIA_TESLA_T4 as the accelerator type:

Screenshot of Google Cloud Console UI showing NVIDIA_TESTLA_T4 selected as the accelerator type

Why does the log show 0 GPUs and does this mean TorchServe is not taking advantage of the accelerator GPU?

2
  • Hi @urig the availability of each type of GPU depends on the region you use for your model. Could you specify the region? Commented Jan 12, 2023 at 7:35
  • Thanks @kiranmathew 🌷 . I'm in europe-west4 where NVIDIA_TESLA_T4 GPUs are regularly available to me for custom jobs in training. If Vertex AI was unable to make one available, should it not have indicated this to me somehow? Commented Jan 12, 2023 at 9:05

1 Answer 1

2

This is a common problem with PyTorch and CUDA. GPU support is only enabled when the right version of PyTorch is installed, i.e. one which compiles for CUDA. So it’s recommended that you use images which have PyTorch's CUDA capabilities.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.