Skip to main content
Stack Overflow for Teams is now Stack Internal: See how we’re powering the human intelligence layer of enterprise AI. Read more >
Filter by
Sorted by
Tagged with
0 votes
1 answer
305 views

I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...
MiszS's user avatar
  • 11
1 vote
0 answers
302 views

I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...
ZZISST's user avatar
  • 21
1 vote
0 answers
185 views

I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...
Dennis Losett's user avatar
3 votes
0 answers
196 views

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...
Abhash Rai's user avatar
0 votes
0 answers
99 views

I am attempting to bundle a rag agent into a .exe. However on usage of the .exe i keep running into the same two problems. The first initial problem is with locating llama-cpp, which i have fixed. The ...
Arnab Mandal's user avatar
-1 votes
2 answers
589 views

Creating directory "llava_shared.dir\Release". Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...
sandeep's user avatar
  • 161
0 votes
0 answers
90 views

I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...
evashort's user avatar
1 vote
0 answers
143 views

I’m working on a project that requires fully deterministic outputs across different machines using Ollama. I’ve ensured the following parameters are identical: Model quantization (e.g., llama2:7b-q4_0)...
user29255210's user avatar
0 votes
0 answers
256 views

I'm experiencing significant performance and output quality issues when running the LLaMA 13B model using the llama_cpp library on my laptop. The same setup works efficiently with the LLaMA 7B model. ...
Farzand Ali's user avatar
0 votes
0 answers
268 views

After updating llama-cpp-python I am getting an error when trying to run an ARM optimized GGUF model TYPE_Q4_0_4_4 REMOVED, use Q4_0 with runtime repacking. After looking into it, the error comes from ...
ekcrisp's user avatar
  • 1,931
2 votes
1 answer
1k views

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...
QUARKS's user avatar
  • 29
1 vote
0 answers
66 views

I was implementing RAG on a document with using the LLama2 model but my model is asking questions to itself and answering it to them. llm = LlamaCpp(model_path=model_path, temperature=0, ...
Knox's user avatar
  • 21
0 votes
0 answers
180 views

I start llama cpp Python server with the command: python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary Then I run my Python script which ...
Jengi829's user avatar
2 votes
1 answer
950 views

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...
caveman's user avatar
  • 464
0 votes
2 answers
856 views

code: from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings from langchain import PromptTemplate from langchain_community.llms import ...
Ashish Sawant's user avatar
1 vote
0 answers
168 views

I am trying to build a chatbot using LangChain. This chatbot uses different backend: Ollama Huggingfaces LLama.cpp Open AI and in a YAML file, I can configure the back end (aka provider) and the ...
Salvatore D'angelo's user avatar
0 votes
1 answer
596 views

I'm trying to create a service using the llama3-70b model by combining langchain and llama-cpp-python on a server workstation. While the model works well with short prompts(question1, question2), it ...
bibiibibin's user avatar
1 vote
2 answers
733 views

When I try installing Llam.cpp, I get the following error: ld: warning: ignoring file '/Users/krishparikh/Projects/LLM/llama.cpp/ggml/src/ggml-metal-embed.o': found architecture 'x86_64', required ...
Krish Parikh's user avatar
0 votes
1 answer
634 views

I am using Mistral 77b-instruct model with llama-index and load the model using llamacpp, and when I am trying to run multiple inputs or prompts ( open 2 website and send 2 prompts) , and it give me ...
HelloALive's user avatar
2 votes
2 answers
4k views

Question How can I programmatically check if llama-cpp-python is installed with support for a CUDA-capable GPU? Context In my program, I am trying to warn the developers when they fail to configure ...
Programmer.zip's user avatar
0 votes
1 answer
244 views

I am trying to install llama-cpp-python on Windows 11. I have installed and set up the CMAKE_ARGS environment variable to point to the MinGW gcc.exe and g++.exe to compile C and C++, but am struggling ...
Leo Turoff's user avatar
-1 votes
2 answers
1k views

I am facing ImportError: cannot import name 'LlamaCPP' from 'llama_index.llms' (unknown location) while implementing and ModuleNotFoundError: No module named 'llama_index.llms.llama_utils' this while ...
shubham joshi2014's user avatar
1 vote
0 answers
862 views

I want to use llama-3 with llama-cpp-python and get main answer for user questions like llama-2. But answers generated by llama-3 not main answer like llama-2: Output: Hey! 👋 What can I help you ...
Dalipboy M's user avatar
1 vote
0 answers
542 views

I am integrating Llama Cpp Python library to run huggingface LLMs on local, I am able to generate the output of text but i would like to add streaming to my chatbot so as soon as the generation is ...
Ashish Gupta's user avatar
3 votes
1 answer
4k views

I'm reaching out to the community for some assistance with an issue I'm encountering in llama.cpp. Previously, the program was successfully utilizing the GPU for execution. However, recently, it seems ...
Montassar Jaziri's user avatar
2 votes
1 answer
4k views

I am trying to run a RAG with Gemma LLM locally it is running fine but the idea is I can't handle more than one request at a time. Is there a way to handle concurrent requests with utilizing resources ...
khalidwalamri's user avatar
1 vote
0 answers
65 views

I created embeddings for only one document so far. But when I ask questions which might are in the context but are definitely not part of this single document I would expect an answer like "I ...
m1ch4's user avatar
  • 51
1 vote
1 answer
577 views

I am trying to do the same prompt (query) I did on a simple pdf (legal pt-br document) using pure llama cpp python but now using llama index: from llama_cpp import Llama import os, re, sys from pypdf ...
celsowm's user avatar
  • 474
0 votes
0 answers
840 views

I am following the instructions from the official documentation on how to install llama-cpp with GPU support in Apple silicon Mac. Here is my Dockerfile: FROM python:3.11-slim WORKDIR /code RUN pip ...
Kristada673's user avatar
  • 3,764
3 votes
1 answer
5k views

I have built a RAG application with Langchain and now want to deploy it with FastAPI. Generally it works tto call a FastAPI endpoint and that the answer of the LCEL-chain gets streamed. However I want ...
Maxl Gemeinderat's user avatar
1 vote
2 answers
7k views

I followed the instruction on https://llama-cpp-python.readthedocs.io/en/latest/install/macos/. My macOS version is Sonoma 14.4, and xcode-select is already installed (version: 15.3.0.0.1.1708646388). ...
ooyeon's user avatar
  • 81
1 vote
0 answers
2k views

I am attempting to load the Zephyr model into llama_cpp Llama, and while everything functions correctly, the performance is slow. The GPU appears to be underutilized, especially when compared to its ...
reach's user avatar
  • 21
0 votes
2 answers
4k views

I am trying to load embeddings like this.I changed the code to reflect the current version change in LlamaIndex but it shows up an attribute error. from llama_index.embeddings.huggingface import ...
Rahul_51's user avatar
1 vote
3 answers
2k views

I've been comparing various langchain compatible llama2 runtimes, using langchain llm chain. Having the following parameter overrides: # llama.cpp: model_path="../llama.cpp/models/generated/...
JayabalanAaron's user avatar
1 vote
0 answers
338 views

I have built a RAG app with Llamacpp and Langserve and it generally works. However I can't find a way to stream my responses, which would be very important for the application. Here is my code: from ...
Maxl Gemeinderat's user avatar
1 vote
1 answer
2k views

I use llama-cpp-python to run LLMs locally on Ubuntu. While generating responses it prints its logs. How to stop printing of logs?? I found a way to stop log printing for llama.cpp but not for llama-...
San Vik's user avatar
  • 11
0 votes
2 answers
751 views

I'm currently taking the DeepAI's Finetuning Coursera course and encountered a bug while trying to run one of their demonstrations locally in a Jupyter notebook. Environment: Python version: 3.11 ...
Hofbr's user avatar
  • 1,020
-1 votes
1 answer
2k views

When I set n_gpu_layer to 1, i can see the following response: To learn Python, you can consider the following options: 1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$▲▅ `▅☻↑↨►☻...
Phương Nguyễn's user avatar
1 vote
3 answers
6k views

I struggled alot while enabling GPU on my 32GB Windows 10 machine with 4GB Nvidia P100 GPU during Python programming. My LLMs did not use the GPU of my machine while inferencing. After spending few ...
Umaima Tinwala's user avatar
2 votes
0 answers
1k views

raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=11434): Max retries exceeded with url: /api/generate/ (Caused by ...
Abhishek Kapoor's user avatar
0 votes
2 answers
2k views

I have put my application into a Docker and therefore I have created a requirements.txt file. Now I need to install llama-cpp-python for Mac, as I am loading my LLM with from langchain.llms import ...
Maxl Gemeinderat's user avatar
1 vote
1 answer
2k views

I'm working on a project using an M1 chip to run the Mistral-7B model. I've successfully set up llama.cpp and can run the model using the following command: ./build/bin/main --color --model "./../...
Max Witwer's user avatar
1 vote
0 answers
1k views

I've been building a RAG pipeline using the llama-cpp-python OpenAI compatible server functionality and have been working my way up from running on just a laptop to running this on a dedicated ...
jhthompson12's user avatar
3 votes
1 answer
3k views

Confession: At first, I am not an expert at all in this sector; I am just practicing and trying to learn while working. Also, I am confused about whether this kind of model does not run on this type ...
Mahmud Arfan's user avatar
1 vote
0 answers
347 views

I am running quantized llama-2 model from here. I am using 2 different machines. 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz 2.80 GHz 16.0 GB (15.8 GB usable) Inference time on this machine is ...
Muhammad Burhan's user avatar
4 votes
1 answer
4k views

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...
Pratyush's user avatar
4 votes
1 answer
3k views

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...
KimuGenie's user avatar
2 votes
0 answers
1k views

I have the following code that works as expected model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf" llm = LlamaCPP(model_url=...
Jamie Dixon's user avatar
  • 4,302
1 vote
1 answer
1k views

I have the following code. I am trying to use the local llama2-chat-13B model. The instructions appear to be good but the final output is erroring out. import logging import sys from IPython.display ...
Birender Singh's user avatar
1 vote
1 answer
3k views

I’m trying to install the llama-cpp-python package in Python, but I’m encountering an issue where the wheel building process gets stuck. Here’s the command I’m using to install the package: pip3 ...
Illanser's user avatar
  • 123