61 questions
0
votes
1
answer
305
views
How to properly install llama-cpp-python on windows 11 with GPU support
I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...
1
vote
0
answers
302
views
Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python
I tried to install llama-cpp-python via pip, but I have an error with the installation
The command that I wrote:
CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...
1
vote
0
answers
185
views
llama-cpp-python installing for x86_64 instead of arm64
I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python.
Even when I run
CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \
...
3
votes
0
answers
196
views
Cannot interence with images on llama-cpp-python
I am new to this. I have been trying but could not make the the model answer on images.
from llama_cpp import Llama
import torch
from PIL import Image
import base64
llm = Llama(
model_path='Holo1-...
0
votes
0
answers
99
views
llama-cpp and transformers with pyinstaller in creation of .exe file
I am attempting to bundle a rag agent into a .exe.
However on usage of the .exe i keep running into the same two problems.
The first initial problem is with locating llama-cpp, which i have fixed.
The ...
-1
votes
2
answers
589
views
while pip install llama-cpp-python getting error on windows pc
Creating directory "llava_shared.dir\Release".
Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...
0
votes
0
answers
90
views
Generating an n-gram dataset based on an LLM
I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...
1
vote
0
answers
143
views
Does Ollama guarantee cross-platform determinism with identical quantization, seed, temperature, and version but differing hardware?
I’m working on a project that requires fully deterministic outputs across different machines using Ollama. I’ve ensured the following parameters are identical:
Model quantization (e.g., llama2:7b-q4_0)...
0
votes
0
answers
256
views
Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?
I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...
0
votes
0
answers
268
views
How do you enable runtime-repack in llama cpp python?
After updating llama-cpp-python I am getting an error when trying to run an ARM optimized GGUF model TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking. After looking into it, the error comes from ...
2
votes
1
answer
1k
views
How to make a llm remember previous runtime chats
I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...
1
vote
0
answers
66
views
My llama2 model is talking to itself asking question and answering it to them using Conversational retrieval chain
I was implementing RAG on a document with using the LLama2 model but my model is asking questions to itself and answering it to them.
llm = LlamaCpp(model_path=model_path,
temperature=0,
...
0
votes
0
answers
180
views
Unable to set top_k value in Llama cpp Python server
I start llama cpp Python server with the command:
python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary
Then I run my Python script which ...
2
votes
1
answer
950
views
How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?
I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me.
This requires me to see a list of candidate next tokens, along their probabilities, ...
0
votes
2
answers
856
views
How do I stream output as it is being generated by an LLM in Streamlit?
code:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain import PromptTemplate
from langchain_community.llms import ...
1
vote
0
answers
168
views
LLama 2 prompt template
I am trying to build a chatbot using LangChain. This chatbot uses different backend:
Ollama
Huggingfaces
LLama.cpp
Open AI
and in a YAML file, I can configure the back end (aka provider) and the ...
0
votes
1
answer
596
views
Does langchain with llama-cpp-python fail to work with very long prompts?
I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...
1
vote
2
answers
733
views
Unable to make llama.cpp on M1 Mac
When I try installing Llam.cpp, I get the following error:
ld: warning: ignoring file '/Users/krishparikh/Projects/LLM/llama.cpp/ggml/src/ggml-metal-embed.o': found architecture 'x86_64', required ...
0
votes
1
answer
634
views
Unable for sending multiple input using Llama CPP and Llama-index
I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me ...
2
votes
2
answers
4k
views
Detecting GPU availability in llama-cpp-python
Question
How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU?
Context
In my program, I am trying to warn the developers when they fail to configure ...
0
votes
1
answer
244
views
(Windows) Setting environment variables with spaces in text
I am trying to install llama-cpp-python on Windows 11. I have installed and set up the CMAKE_ARGS environment variable to point to the MinGW gcc.exe and g++.exe to compile C and C++, but am struggling ...
-1
votes
2
answers
1k
views
I am facing ImportError: cannot import name 'LlamaCPP' from 'llama_index.llms' (unknown location) while implementing this
I am facing ImportError: cannot import name 'LlamaCPP' from 'llama_index.llms' (unknown location) while implementing
and ModuleNotFoundError: No module named 'llama_index.llms.llama_utils' this
while ...
1
vote
0
answers
862
views
How can I set just give main answer from llama-3-8B-Instruct and not talk to itself?
I want to use llama-3 with llama-cpp-python and get main answer for user questions like llama-2.
But answers generated by llama-3 not main answer like llama-2:
Output: Hey! 👋 What can I help you ...
1
vote
0
answers
542
views
How to add streaming to my gradio chatbot when using Llama cpp pyhton with langchain
I am integrating Llama Cpp Python library to run huggingface LLMs on local, I am able to generate the output of text but i would like to add streaming to my chatbot so as soon as the generation is ...
3
votes
1
answer
4k
views
Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU
I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...
2
votes
1
answer
4k
views
Running Local LLMs in Production and handling multiple requests
I am trying to run a RAG with Gemma LLM locally it is running fine but the idea is I can't handle more than one request at a time.
Is there a way to handle concurrent requests with utilizing resources ...
1
vote
0
answers
65
views
Chat model provides answers without source docs
I created embeddings for only one document so far. But when I ask questions which might are in the context but are definitely not part of this single document I would expect an answer like "I ...
1
vote
1
answer
577
views
How can I get the same result from LlamaCPP using it in Llama-index?
I am trying to do the same prompt (query) I did on a simple pdf (legal pt-br document) using pure llama cpp python but now using llama index:
from llama_cpp import Llama
import os, re, sys
from pypdf ...
0
votes
0
answers
840
views
llama-cpp-python with metal acceleration on Apple silicon failing
I am following the instructions from the official documentation on how to install llama-cpp with GPU support in Apple silicon Mac.
Here is my Dockerfile:
FROM python:3.11-slim
WORKDIR /code
RUN pip ...
3
votes
1
answer
5k
views
RAG with Langchain and FastAPI: Stream generated answer and return source documents
I have built a RAG application with Langchain and now want to deploy it with FastAPI. Generally it works tto call a FastAPI endpoint and that the answer of the LCEL-chain gets streamed. However I want ...
1
vote
2
answers
7k
views
Failed to install llama-cpp-python with Metal on M2 Ultra
I followed the instruction on https://llama-cpp-python.readthedocs.io/en/latest/install/macos/.
My macOS version is Sonoma 14.4, and xcode-select is already installed (version: 15.3.0.0.1.1708646388).
...
1
vote
0
answers
2k
views
I have problem using n_gpu_layers in llama_cpp Llama function
I am attempting to load the Zephyr model into llama_cpp Llama, and while everything functions correctly, the performance is slow. The GPU appears to be underutilized, especially when compared to its ...
0
votes
2
answers
4k
views
Loading embedding model form Hugging Face in Llama Index throws up an attribute error
I am trying to load embeddings like this.I changed the code to reflect the current version change in LlamaIndex but it shows up an attribute error.
from llama_index.embeddings.huggingface import ...
1
vote
3
answers
2k
views
Inconsistent completion for identical prompts and params with llama.cpp python and ctransformer
I've been comparing various langchain compatible llama2 runtimes, using langchain llm chain.
Having the following parameter overrides:
# llama.cpp:
model_path="../llama.cpp/models/generated/...
1
vote
0
answers
338
views
Langserve Streaming with Llamacpp
I have built a RAG app with Llamacpp and Langserve and it generally works. However I can't find a way to stream my responses, which would be very important for the application. Here is my code:
from ...
1
vote
1
answer
2k
views
llama-cpp-python Log printing on Ubuntu
I use llama-cpp-python to run LLMs locally on Ubuntu. While generating responses it prints its logs.
How to stop printing of logs??
I found a way to stop log printing for llama.cpp but not for llama-...
0
votes
2
answers
751
views
TypeError in Python 3.11 when Using BasicModelRunner from llama-cpp-python
I'm currently taking the DeepAI's Finetuning Coursera course and encountered a bug while trying to run one of their demonstrations locally in a Jupyter notebook.
Environment:
Python version: 3.11
...
-1
votes
1
answer
2k
views
How can i fix gpu error of llama_cpp_python?
When I set n_gpu_layer to 1, i can see the following response:
To learn Python, you can consider the following options:
1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$▲▅
`▅☻↑↨►☻...
1
vote
3
answers
6k
views
Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)
I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few ...
2
votes
0
answers
1k
views
Connection error in langchain with llama2 model downloaded locally
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by ...
0
votes
2
answers
2k
views
CMAKE in requirements.txt file: Install llama-cpp-python for Mac
I have put my application into a Docker and therefore I have created a requirements.txt file. Now I need to install llama-cpp-python for Mac, as I am loading my LLM with from langchain.llms import ...
1
vote
1
answer
2k
views
M1 Chip: Running Mistral-7B with Llama.cpp Works, but Python Wrapper Causes Slowdown and Errors
I'm working on a project using an M1 chip to run the Mistral-7B model. I've successfully set up llama.cpp and can run the model using the following command:
./build/bin/main --color --model "./../...
1
vote
0
answers
1k
views
llama-cpp-python on GPU: Delay between prompt submission and first token generation with longer prompts
I've been building a RAG pipeline using the llama-cpp-python OpenAI compatible server functionality and have been working my way up from running on just a laptop to running this on a dedicated ...
3
votes
1
answer
3k
views
LLM model is not loading into the GPU even after BLAS = 1, LlamaCpp, Langchain, Mistral 7b GGUF Model
Confession:
At first, I am not an expert at all in this sector; I am just practicing and trying to learn while working. Also, I am confused about whether this kind of model does not run on this type ...
1
vote
0
answers
347
views
Llama-2, Q4-Quantized model's response time on different CPUs
I am running quantized llama-2 model from here. I am using 2 different machines.
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz
16.0 GB (15.8 GB usable)
Inference time on this machine is ...
4
votes
1
answer
4k
views
No GPU support while running llama-cpp-python inside a docker container
I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container.
Following this repo for installation of llama_cpp_python==0.2.6.
DOCKERFILE
# Use the ...
4
votes
1
answer
3k
views
How can I install llama-cpp-python with cuBLAS using poetry?
I can install llama cpp with cuBLAS using pip as below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
However, I don't know how to install it with cuBLAS when ...
2
votes
0
answers
1k
views
llama-index: multiple calls to query_engine.query always gives "Empty Response"
I have the following code that works as expected
model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"
llm = LlamaCPP(model_url=...
1
vote
1
answer
1k
views
PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)
I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out.
import logging
import sys
from IPython.display ...
1
vote
1
answer
3k
views
Unable to install llama-cpp-python Package in Python - Wheel Building Process gets Stuck
I’m trying to install the llama-cpp-python package in Python, but I’m encountering an issue where the wheel building process gets stuck. Here’s the command I’m using to install the package:
pip3 ...