Question
How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU?
Context
In my program, I am trying to warn the developers when they fail to configure their system in a way that allows the llama-cpp-python LLMs to leverage GPU acceleration. For example, they may have installed the library using pip install llama-cpp-python without setting appropriate environment variables for CUDA acceleration, or the CUDA Toolkit may be missing from their operating system.
What I Have Tried
In earlier versions of the library, I could reliably detect whether a GPU was available, i.e., I got fast responses, high GPU utilization, and detected GPU availability if and only if the library was installed with appropriate environment variables.
Initially, I used to check GPU availability using:
from llama_cpp.llama_cpp import GGML_USE_CUBLAS
def is_gpu_available_v1() -> bool:
return GGML_USE_CUBLAS
Later, the GGML_USE_CUBLAS was removed. For some time, I used the following alternative:
from llama_cpp.llama_cpp import _load_shared_library
def is_gpu_available_v2() -> bool:
lib = _load_shared_library('llama')
return hasattr(lib, 'ggml_init_cublas')
For newer versions of the library, the latter approach consistently returns False, even if the inference for the LLM is being executed on a GPU.