I would like to deploy a huggingface text embedding model endpoint via aws sagemaker.
Here is my code so far:
import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
# sess = sagemaker.Session()
role = sagemaker.get_execution_role()
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'sentence-transformers/all-MiniLM-L12-v2', # model_id from hf.co/models
'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
py_version='py36',
transformers_version="4.6", # transformers version used
pytorch_version="1.7", # pytorch version used
)
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)
data = {
"inputs": ["This is an example sentence", "Each sentence is converted"]
}
result = predictor.predict(data)
print(len(result[0]))
print(result[0])
While this does deploy a endpoint successfully, it does not behave the way it should. I expect for each string in the input list to get a 1x384 list of floats as output. But instead i get 7x384 lists for each sentence. Did I maybe use the wrong pipeline?