0

I would like to deploy a huggingface text embedding model endpoint via aws sagemaker.

Here is my code so far:

import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

# sess = sagemaker.Session()
role = sagemaker.get_execution_role()

# Hub Model configuration. <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'sentence-transformers/all-MiniLM-L12-v2', # model_id from hf.co/models
  'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    env=hub, # configuration for loading model from Hub
    role=role, # iam role with permissions to create an Endpoint
    py_version='py36',
    transformers_version="4.6", # transformers version used
    pytorch_version="1.7", # pytorch version used
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge"
)

data = {
"inputs": ["This is an example sentence", "Each sentence is converted"]
}

result = predictor.predict(data)
print(len(result[0]))
print(result[0])

While this does deploy a endpoint successfully, it does not behave the way it should. I expect for each string in the input list to get a 1x384 list of floats as output. But instead i get 7x384 lists for each sentence. Did I maybe use the wrong pipeline?

2 Answers 2

1

The output you are seeing is the default that is produced by that model. If you would like to shape output for as you expect you can either do this on the client side (once output is received) or also attach an inference.py script that implements functions that will shape your output: specifically the predict_fn and output_fn functions.

Example: https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script/code

def model_fn(model_dir):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained(model_dir)
  model = AutoModel.from_pretrained(model_dir)
  return model, tokenizer

def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences, preprocessing etc ....

   
Sign up to request clarification or add additional context in comments.

Comments

0

There are two ways to deploy HuggingFace Models as Sagemaker Endpoints:

  1. The way you have done, defining env=hub inside HuggingFaceModel class. This is a nice and quick way to get inferences from the model without any custom preprocessing. You send a request and you will get a response in the raw form with which the model was created.
  2. If you want to do more with each request sent to model i.e. preprocess the inputs, change the behaviour of model and/or postprocess the output, you will need to use custom scripts. Sample HuggingFaceModel class parameters are:
huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.37.0",  # transformers version used
   pytorch_version="2.1.0",        # pytorch version used
   py_version='py310',            # python version used
    #model_server_workers=1,
    #image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04"
   #env=hub
)

This is the complete reference you need: https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb

Additional Info: The handler file that will run with each request your endpoint receives:https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/handler_service.py

2 Comments

not the answer i was looking for back then, but fixed half the error i am currently facing, thx^^
Happy to help in any way

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.