Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)

Question

I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few days on this I thought I will summarize my step by step approach which worked for me

Install C++ distribution. I did it via Visual Studio 2022 Installer and installing packages under "Desktop Development with C++" and checking the option "Windows 10 SDK (10.0.20348.0) as shown in this image (https://i.sstatic.net/vLDy7.png). Install the packages.
Download and Install Nvidia CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
Ensure that CUDA_PATH variable is set in your environment variables
In Visual Studio Code, set the following environment variables:

$env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
$env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"

Finally run:

pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

Then, when running the python program, you will see that BLAS is set to 1 (https://i.sstatic.net/iKIkV.png)

Hope it helps the community too!!!

Please post the solution as an answer and accept it by clicking on tick icon ✅ which is at left of the answer, so that community can understand that the question has been answered. See can I answer my own question., Just a reminder :) — JialeDu
– JialeDu, Commented Jan 17, 2024 at 7:11
Stack Overflow is not a blog, it's a Q & A site. However, you may answer your own question, but the first step is asking a question, and this is what's missing. — President James K. Polk
– President James K. Polk, Commented Jan 18, 2024 at 0:47

MingJie-MSFT · Accepted Answer · 2024-01-17 08:11:28Z

0

Install C++ distribution. I did it via Visual Studio 2022 Installer and installing packages under "Desktop Development with C++" and checking the option "Windows 10 SDK (10.0.20348.0) as shown in this image
(https://i.sstatic.net/vLDy7.png). Install the packages. Download and Install Nvidia CUDA Toolkit (https://developer.nvidia.com/cuda-downloads)
Ensure that CUDA_PATH variable is set in your environment variables
In Visual Studio Code, set the following environment variables $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" $env:CUDACXX="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.exe"
Finally run pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade

answered Jan 17, 2024 at 8:11

community wiki

MingJie-MSFT

Sign up to request clarification or add additional context in comments.

Comments

Pradeep · Accepted Answer · 2024-09-19 08:09:51Z

0

Pre-built wheel with CUDA support is the best option as long as your system meets some requirements:

CUDA Version is 12.1, 12.2, 12.3, or 12.4
Python Version is 3.10, 3.11 or 3.12

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/<cuda-version>

Where <cuda-version> is one of the following:

cu121: CUDA 12.1
cu122: CUDA 12.2
cu123: CUDA 12.3
cu124: CUDA 12.4

For example, to install the CUDA 12.1 wheel:

pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

answered Sep 19, 2024 at 8:09

Pradeep

12 bronze badges

Comments

Kalvacharla Sasank · Accepted Answer · 2025-10-11 09:56:29Z

I was also facing the same issue. I am running on Windows 11 system with 12GB RTX 3060 and 32GB Ram. Wanted to offload some layers to GPU for a 20GB model.

Cuda v12.9 is properly installed and nvcc --version , nvidia-smi are working fine. llama-server --list-devices is detecting the GPU correctly.

The problem was with the pip install llama-cpp-python. For some reason it was not detecting cuda properly. I tried all steps mentioned in this thread but still it didn't work. Building from source fixed the issue for me.

Clone the repo:

git clone --recursive https://github.com/abetlen/llama-cpp-python.git
cd llama-cpp-python

Env vars setting is same as mentioned in this post. I had an extra one because of separate CURL installation.

set FORCE_CMAKE=1
set CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_TOOLCHAIN_FILE=C:\Users\xxxx\vcpkg\scripts\buildsystems\vcpkg.cmake"

The install from source

python -m pip install . --no-cache-dir --force-reinstall --upgrade

After this it was using GPU.

Collectives™ on Stack Overflow

Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related