Newest 'llama-cpp-python' Questions

0 votes

1 answer

305 views

How to properly install llama-cpp-python on windows 11 with GPU support

I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...

MiszS

11

asked Oct 4 at 22:29

1 vote

0 answers

302 views

Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python

I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...

ZZISST

21

asked Aug 26 at 20:53

1 vote

0 answers

185 views

llama-cpp-python installing for x86_64 instead of arm64

I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...

Dennis Losett

11

asked Jun 22 at 16:45

3 votes

0 answers

196 views

Cannot interence with images on llama-cpp-python

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...

Abhash Rai

61

asked Jun 7 at 5:50

0 votes

0 answers

99 views

llama-cpp and transformers with pyinstaller in creation of .exe file

I am attempting to bundle a rag agent into a .exe. However on usage of the .exe i keep running into the same two problems. The first initial problem is with locating llama-cpp, which i have fixed. The ...

Arnab Mandal

1

asked May 23 at 6:57

-1 votes

2 answers

589 views

while pip install llama-cpp-python getting error on windows pc

Creating directory "llava_shared.dir\Release". Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...

sandeep

161

asked Feb 14 at 12:48

0 votes

0 answers

90 views

Generating an n-gram dataset based on an LLM

I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...

evashort

1

asked Feb 9 at 14:21

1 vote

0 answers

143 views

Does Ollama guarantee cross-platform determinism with identical quantization, seed, temperature, and version but differing hardware?

I’m working on a project that requires fully deterministic outputs across different machines using Ollama. I’ve ensured the following parameters are identical: Model quantization (e.g., llama2:7b-q4_0)...

user29255210

45

asked Jan 27 at 9:13

0 votes

0 answers

256 views

Why Does Running LLaMA 13B Model with llama_cpp on CPU Take Excessive Time and Produce Poor Outputs?

I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...

Farzand Ali

3

asked Dec 13, 2024 at 16:45

0 votes

0 answers

268 views

How do you enable runtime-repack in llama cpp python?

After updating llama-cpp-python I am getting an error when trying to run an ARM optimized GGUF model TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking. After looking into it, the error comes from ...

ekcrisp

1,931

asked Dec 10, 2024 at 4:19

2 votes

1 answer

1k views

How to make a llm remember previous runtime chats

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...

QUARKS

29

asked Oct 26, 2024 at 18:02

1 vote

0 answers

66 views

My llama2 model is talking to itself asking question and answering it to them using Conversational retrieval chain

I was implementing RAG on a document with using the LLama2 model but my model is asking questions to itself and answering it to them. llm = LlamaCpp(model_path=model_path, temperature=0, ...

Knox

21

asked Oct 10, 2024 at 18:03

0 votes

0 answers

180 views

Unable to set top_k value in Llama cpp Python server

I start llama cpp Python server with the command: python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary Then I run my Python script which ...

Jengi829

1

asked Aug 29, 2024 at 15:13

2 votes

1 answer

950 views

How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...

caveman

464

asked Aug 24, 2024 at 18:15

0 votes

2 answers

856 views

How do I stream output as it is being generated by an LLM in Streamlit?

code: from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings from langchain import PromptTemplate from langchain_community.llms import ...

Ashish Sawant

1

asked Jul 26, 2024 at 5:27

1 vote

0 answers

168 views

LLama 2 prompt template

I am trying to build a chatbot using LangChain. This chatbot uses different backend: Ollama Huggingfaces LLama.cpp Open AI and in a YAML file, I can configure the back end (aka provider) and the ...

Salvatore D'angelo

1,179

asked Jul 24, 2024 at 5:50

0 votes

1 answer

596 views

Does langchain with llama-cpp-python fail to work with very long prompts?

I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...

bibiibibin

1

asked Jul 18, 2024 at 15:39

1 vote

2 answers

733 views

Unable to make llama.cpp on M1 Mac

When I try installing Llam.cpp, I get the following error: ld: warning: ignoring file '/Users/krishparikh/Projects/LLM/llama.cpp/ggml/src/ggml-metal-embed.o': found architecture 'x86_64', required ...

Krish Parikh

21

asked Jul 11, 2024 at 20:11

0 votes

1 answer

634 views

Unable for sending multiple input using Llama CPP and Llama-index

I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me ...

HelloALive

1

asked May 17, 2024 at 5:50

2 votes

2 answers

4k views

Detecting GPU availability in llama-cpp-python

Question How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU? Context In my program, I am trying to warn the developers when they fail to configure ...

Programmer.zip

810

asked May 1, 2024 at 20:33

0 votes

1 answer

244 views

(Windows) Setting environment variables with spaces in text

I am trying to install llama-cpp-python on Windows 11. I have installed and set up the CMAKE_ARGS environment variable to point to the MinGW gcc.exe and g++.exe to compile C and C++, but am struggling ...

Leo Turoff

1

asked Apr 26, 2024 at 22:59

-1 votes

2 answers

1k views

I am facing ImportError: cannot import name 'LlamaCPP' from 'llama_index.llms' (unknown location) while implementing this

I am facing ImportError: cannot import name 'LlamaCPP' from 'llama_index.llms' (unknown location) while implementing and ModuleNotFoundError: No module named 'llama_index.llms.llama_utils' this while ...

shubham joshi2014

1

asked Apr 26, 2024 at 14:12

1 vote

0 answers

862 views

How can I set just give main answer from llama-3-8B-Instruct and not talk to itself?

I want to use llama-3 with llama-cpp-python and get main answer for user questions like llama-2. But answers generated by llama-3 not main answer like llama-2: Output: Hey! 👋 What can I help you ...

Dalipboy M

11

asked Apr 22, 2024 at 6:45

1 vote

0 answers

542 views

How to add streaming to my gradio chatbot when using Llama cpp pyhton with langchain

I am integrating Llama Cpp Python library to run huggingface LLMs on local, I am able to generate the output of text but i would like to add streaming to my chatbot so as soon as the generation is ...

Ashish Gupta

11

asked Apr 20, 2024 at 7:17

3 votes

1 answer

4k views

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...

Montassar Jaziri

31

asked Apr 18, 2024 at 8:26

2 votes

1 answer

4k views

Running Local LLMs in Production and handling multiple requests

I am trying to run a RAG with Gemma LLM locally it is running fine but the idea is I can't handle more than one request at a time. Is there a way to handle concurrent requests with utilizing resources ...

khalidwalamri

21

asked Apr 16, 2024 at 17:18

1 vote

0 answers

65 views

Chat model provides answers without source docs

I created embeddings for only one document so far. But when I ask questions which might are in the context but are definitely not part of this single document I would expect an answer like "I ...

m1ch4

51

asked Apr 10, 2024 at 15:31

1 vote

1 answer

577 views

How can I get the same result from LlamaCPP using it in Llama-index?

I am trying to do the same prompt (query) I did on a simple pdf (legal pt-br document) using pure llama cpp python but now using llama index: from llama_cpp import Llama import os, re, sys from pypdf ...

celsowm

474

asked Mar 28, 2024 at 14:52

0 votes

0 answers

840 views

llama-cpp-python with metal acceleration on Apple silicon failing

I am following the instructions from the official documentation on how to install llama-cpp with GPU support in Apple silicon Mac. Here is my Dockerfile: FROM python:3.11-slim WORKDIR /code RUN pip ...

Kristada673

3,764

asked Mar 28, 2024 at 11:43

3 votes

1 answer

5k views

RAG with Langchain and FastAPI: Stream generated answer and return source documents

I have built a RAG application with Langchain and now want to deploy it with FastAPI. Generally it works tto call a FastAPI endpoint and that the answer of the LCEL-chain gets streamed. However I want ...

Maxl Gemeinderat

565

asked Mar 27, 2024 at 15:50

1 vote

2 answers

7k views

Failed to install llama-cpp-python with Metal on M2 Ultra

I followed the instruction on https://llama-cpp-python.readthedocs.io/en/latest/install/macos/. My macOS version is Sonoma 14.4, and xcode-select is already installed (version: 15.3.0.0.1.1708646388). ...

ooyeon

81

asked Mar 19, 2024 at 9:03

1 vote

0 answers

2k views

I have problem using n_gpu_layers in llama_cpp Llama function

I am attempting to load the Zephyr model into llama_cpp Llama, and while everything functions correctly, the performance is slow. The GPU appears to be underutilized, especially when compared to its ...

reach

21

asked Feb 22, 2024 at 6:21

0 votes

2 answers

4k views

Loading embedding model form Hugging Face in Llama Index throws up an attribute error

I am trying to load embeddings like this.I changed the code to reflect the current version change in LlamaIndex but it shows up an attribute error. from llama_index.embeddings.huggingface import ...

Rahul_51

1

asked Feb 20, 2024 at 3:15

1 vote

3 answers

2k views

Inconsistent completion for identical prompts and params with llama.cpp python and ctransformer

I've been comparing various langchain compatible llama2 runtimes, using langchain llm chain. Having the following parameter overrides: # llama.cpp: model_path="../llama.cpp/models/generated/...

JayabalanAaron

385

asked Feb 18, 2024 at 11:29

1 vote

0 answers

338 views

Langserve Streaming with Llamacpp

I have built a RAG app with Llamacpp and Langserve and it generally works. However I can't find a way to stream my responses, which would be very important for the application. Here is my code: from ...

Maxl Gemeinderat

565

asked Feb 1, 2024 at 13:49

1 vote

1 answer

2k views

llama-cpp-python Log printing on Ubuntu

I use llama-cpp-python to run LLMs locally on Ubuntu. While generating responses it prints its logs. How to stop printing of logs?? I found a way to stop log printing for llama.cpp but not for llama-...

San Vik

11

asked Jan 29, 2024 at 3:22

0 votes

2 answers

751 views

TypeError in Python 3.11 when Using BasicModelRunner from llama-cpp-python

I'm currently taking the DeepAI's Finetuning Coursera course and encountered a bug while trying to run one of their demonstrations locally in a Jupyter notebook. Environment: Python version: 3.11 ...

Hofbr

1,020

asked Jan 27, 2024 at 21:47

-1 votes

1 answer

2k views

How can i fix gpu error of llama_cpp_python?

When I set n_gpu_layer to 1, i can see the following response: To learn Python, you can consider the following options: 1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$▲▅ `▅☻↑↨►☻...

Phương Nguyễn

1

asked Jan 18, 2024 at 14:04

1 vote

3 answers

6k views

Enable GPU for Python programming with VS Code on Windows 10 (llama-cpp-python)

I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few ...

Umaima Tinwala

11

asked Jan 17, 2024 at 6:03

2 votes

0 answers

1k views

Connection error in langchain with llama2 model downloaded locally

raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by ...

Abhishek Kapoor

21

asked Jan 5, 2024 at 6:04

0 votes

2 answers

2k views

CMAKE in requirements.txt file: Install llama-cpp-python for Mac

I have put my application into a Docker and therefore I have created a requirements.txt file. Now I need to install llama-cpp-python for Mac, as I am loading my LLM with from langchain.llms import ...

Maxl Gemeinderat

565

asked Jan 4, 2024 at 12:28

1 vote

1 answer

2k views

M1 Chip: Running Mistral-7B with Llama.cpp Works, but Python Wrapper Causes Slowdown and Errors

I'm working on a project using an M1 chip to run the Mistral-7B model. I've successfully set up llama.cpp and can run the model using the following command: ./build/bin/main --color --model "./../...

Max Witwer

21

asked Dec 29, 2023 at 23:35

1 vote

0 answers

1k views

llama-cpp-python on GPU: Delay between prompt submission and first token generation with longer prompts

I've been building a RAG pipeline using the llama-cpp-python OpenAI compatible server functionality and have been working my way up from running on just a laptop to running this on a dedicated ...

jhthompson12

69

asked Dec 26, 2023 at 19:19

3 votes

1 answer

3k views

LLM model is not loading into the GPU even after BLAS = 1, LlamaCpp, Langchain, Mistral 7b GGUF Model

Confession: At first, I am not an expert at all in this sector; I am just practicing and trying to learn while working. Also, I am confused about whether this kind of model does not run on this type ...

Mahmud Arfan

43

asked Dec 19, 2023 at 11:14

1 vote

0 answers

347 views

Llama-2, Q4-Quantized model's response time on different CPUs

I am running quantized llama-2 model from here. I am using 2 different machines. 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz 16.0 GB (15.8 GB usable) Inference time on this machine is ...

Muhammad Burhan

48

asked Nov 29, 2023 at 11:56

4 votes

1 answer

4k views

No GPU support while running llama-cpp-python inside a docker container

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...

Pratyush

39

asked Nov 23, 2023 at 6:09

4 votes

1 answer

3k views

How can I install llama-cpp-python with cuBLAS using poetry?

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...

KimuGenie

41

asked Nov 23, 2023 at 2:43

2 votes

0 answers

1k views

llama-index: multiple calls to query_engine.query always gives "Empty Response"

I have the following code that works as expected model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf" llm = LlamaCPP(model_url=...

Jamie Dixon

4,302

asked Nov 14, 2023 at 1:05

1 vote

1 answer

1k views

PandasQueryEngine from llama-index is unable to execute code with the following error: invalid syntax (, line 0)

I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out. import logging import sys from IPython.display ...

Birender Singh

11

asked Nov 8, 2023 at 12:57

1 vote

1 answer

3k views

Unable to install llama-cpp-python Package in Python - Wheel Building Process gets Stuck

I’m trying to install the llama-cpp-python package in Python, but I’m encountering an issue where the wheel building process gets stuck. Here’s the command I’m using to install the package: pip3 ...

Illanser

123

asked Oct 29, 2023 at 18:52

Collectives™ on Stack Overflow