I’m trying to generate embeddings using the Hugging Face Inference API with LangChain in Python, but I’m running into issues. My goal is to use the API (not local models) to generate embeddings for text, specifically with the sentence-transformers/all-MiniLM-L6-v2 model. I’m using langchain-huggingface==0.2.0 and huggingface-hub==0.31.2 in a Conda environment with Python 3.10. Here’s the context: I have a valid Hugging Face API token set as an environment variable (HUGGINGFACEHUB_API_TOKEN), and I want to integrate this with LangChain for embeddings.
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_MyTokenHere" # Valid token confirmed
embeddings = HuggingFaceInferenceAPIEmbeddings(
api_key=os.environ["HUGGINGFACEHUB_API_TOKEN"],
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# Fails here
vector = embeddings.embed_query("Test query")
Error:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
File d:\Python_Env\LangChain\venv\lib\site-packages\requests\models.py:974, in Response.json(self, **kwargs)
973 try:
--> 974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
File d:\Python_Env\LangChain\venv\lib\json\__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
343 if (cls is None and object_hook is None and
344 parse_int is None and parse_float is None and
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
File d:\Python_Env\LangChain\venv\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
333 """Return the Python representation of ``s`` (a ``str`` instance
334 containing a JSON document).
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
File d:\Python_Env\LangChain\venv\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
354 except StopIteration as err:
--> 355 raise JSONDecodeError("Expecting value", s, err.value) from None
356 return obj, end
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
JSONDecodeError Traceback (most recent call last)
Cell In[12], line 16
14 text = "LangChain makes LLM applications modular"
15 #query_embedding = embeddings.embed_query(text)
---> 16 document_embeddings = embeddings.embed_documents([text, "HuggingFace provides great models"])
File d:\Python_Env\LangChain\venv\lib\site-packages\langchain_community\embeddings\huggingface.py:472, in HuggingFaceInferenceAPIEmbeddings.embed_documents(self, texts)
441 """Get the embeddings for a list of texts.
442
443 Args:
(...)
462 hf_embeddings.embed_documents(texts)
463 """ # noqa: E501
464 response = requests.post(
465 self._api_url,
466 headers=self._headers,
(...)
470 },
471 )
--> 472 return response.json()
File d:\Python_Env\LangChain\venv\lib\site-packages\requests\models.py:978, in Response.json(self, **kwargs)
974 return complexjson.loads(self.text, **kwargs)
975 except JSONDecodeError as e:
976 # Catch JSON-related errors and raise as requests.JSONDecodeError
977 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 978 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
What I’ve Verified:
My Hugging Face token is valid and has Inference API permissions.
The model sentence-transformers/all-MiniLM-L6-v2 is public and accessible via the UI.
Installed latest versions