1

I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few days on this I thought I will summarize my step by step approach which worked for me

  1. Install C++ distribution. I did it via Visual Studio 2022 Installer and installing packages under "Desktop Development with C++" and checking the option "Windows 10 SDK (10.0.20348.0) as shown in this image (https://i.sstatic.net/vLDy7.png). Install the packages.
  2. Download and Install Nvidia CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
  3. Ensure that CUDA_PATH variable is set in your environment variables
  4. In Visual Studio Code, set the following environment variables:
$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"
  1. Finally run:
pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Then, when running the python program, you will see that BLAS is set to 1 (https://i.sstatic.net/iKIkV.png)

Hope it helps the community too!!!

2
  • Please post the solution as an answer and accept it by clicking on tick icon ✅ which is at left of the answer, so that community can understand that the question has been answered. See can I answer my own question., Just a reminder :) Commented Jan 17, 2024 at 7:11
  • Stack Overflow is not a blog, it's a Q & A site. However, you may answer your own question, but the first step is asking a question, and this is what's missing. Commented Jan 18, 2024 at 0:47

3 Answers 3

0
  1. Install C++ distribution. I did it via Visual Studio 2022 Installer and installing packages under "Desktop Development with C++" and checking the option "Windows 10 SDK (10.0.20348.0) as shown in this image
  2. (https://i.sstatic.net/vLDy7.png). Install the packages. Download and Install Nvidia CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
  3. Ensure that CUDA_PATH variable is set in your environment variables
  4. In Visual Studio Code, set the following environment variables $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" $env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"
  5. Finally run pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
Sign up to request clarification or add additional context in comments.

Comments

0

Pre-built wheel with CUDA support is the best option as long as your system meets some requirements:

  • CUDA Version is 12.1, 12.2, 12.3, or 12.4
  • Python Version is 3.10, 3.11 or 3.12
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>

Where <cuda-version> is one of the following:

  • cu121: CUDA 12.1
  • cu122: CUDA 12.2
  • cu123: CUDA 12.3
  • cu124: CUDA 12.4

For example, to install the CUDA 12.1 wheel:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

Comments

0

I was also facing the same issue. I am running on Windows 11 system with 12GB RTX 3060 and 32GB Ram. Wanted to offload some layers to GPU for a 20GB model.

Cuda v12.9 is properly installed and nvcc --version , nvidia-smi are working fine. llama-server --list-devices is detecting the GPU correctly.

The problem was with the pip install llama-cpp-python. For some reason it was not detecting cuda properly. I tried all steps mentioned in this thread but still it didn't work. Building from source fixed the issue for me.

  • Clone the repo:
git clone --recursive https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python 
  • Env vars setting is same as mentioned in this post. I had an extra one because of separate CURL installation.
set FORCE_CMAKE=1
set CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_TOOLCHAIN_FILE=C:\Users\xxxx\vcpkg\scripts\buildsystems\vcpkg.cmake"
  • The install from source
python -m pip install . --no-cache-dir --force-reinstall --upgrade

After this it was using GPU.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.