Deploy AWS SageMaker endpoint for Hugging Face embedding model

Question

I would like to deploy a huggingface text embedding model endpoint via aws sagemaker.

Here is my code so far:

import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel

# sess = sagemaker.Session()
role = sagemaker.get_execution_role()

# Hub Model configuration. <https://huggingface.co/models>
hub = {
  'HF_MODEL_ID':'sentence-transformers/all-MiniLM-L12-v2', # model_id from hf.co/models
  'HF_TASK':'feature-extraction' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    env=hub, # configuration for loading model from Hub
    role=role, # iam role with permissions to create an Endpoint
    py_version='py36',
    transformers_version="4.6", # transformers version used
    pytorch_version="1.7", # pytorch version used
)

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge"
)

data = {
"inputs": ["This is an example sentence", "Each sentence is converted"]
}

result = predictor.predict(data)
print(len(result[0]))
print(result[0])

While this does deploy a endpoint successfully, it does not behave the way it should. I expect for each string in the input list to get a 1x384 list of floats as output. But instead i get 7x384 lists for each sentence. Did I maybe use the wrong pipeline?

Ram Vegiraju · Accepted Answer · 2023-10-02 19:16:23Z

1

The output you are seeing is the default that is produced by that model. If you would like to shape output for as you expect you can either do this on the client side (once output is received) or also attach an inference.py script that implements functions that will shape your output: specifically the predict_fn and output_fn functions.

Example: https://github.com/huggingface/notebooks/tree/main/sagemaker/17_custom_inference_script/code

def model_fn(model_dir):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained(model_dir)
  model = AutoModel.from_pretrained(model_dir)
  return model, tokenizer

def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences, preprocessing etc ....

answered Oct 2, 2023 at 19:16

Ram Vegiraju

3793 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akshat garg · Accepted Answer · 2024-04-04 14:48:07Z

0

There are two ways to deploy HuggingFace Models as Sagemaker Endpoints:

The way you have done, defining env=hub inside HuggingFaceModel class. This is a nice and quick way to get inferences from the model without any custom preprocessing. You send a request and you will get a response in the raw form with which the model was created.
If you want to do more with each request sent to model i.e. preprocess the inputs, change the behaviour of model and/or postprocess the output, you will need to use custom scripts. Sample HuggingFaceModel class parameters are:

huggingface_model = HuggingFaceModel(
   model_data=s3_location,       # path to your model and script
   role=role,                    # iam role with permissions to create an Endpoint
   transformers_version="4.37.0",  # transformers version used
   pytorch_version="2.1.0",        # pytorch version used
   py_version='py310',            # python version used
    #model_server_workers=1,
    #image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-cpu-py310-ubuntu22.04"
   #env=hub
)

This is the complete reference you need: https://github.com/huggingface/notebooks/blob/main/sagemaker/17_custom_inference_script/sagemaker-notebook.ipynb

Additional Info: The handler file that will run with each request your endpoint receives:https://github.com/aws/sagemaker-huggingface-inference-toolkit/blob/main/src/sagemaker_huggingface_inference_toolkit/handler_service.py

answered Apr 4, 2024 at 14:48

akshat garg

3143 silver badges13 bronze badges

2 Comments

MaxS. Over a year ago

not the answer i was looking for back then, but fixed half the error i am currently facing, thx^^

akshat garg Over a year ago

Happy to help in any way

Collectives™ on Stack Overflow

Deploy AWS SageMaker endpoint for Hugging Face embedding model

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related