使用Hugging Face Instructor与AWS Sagemaker。

huangapple go评论88阅读模式
英文:

Use Hugging Face Instructor with AWS Sagemaker

问题

我正试图使用AWS SageMaker从Instructor模型获取推断(即根据Instructor模型生成嵌入),但在SageMaker的模型类上调用predict()时遇到了各种障碍。下面的设置可能存在什么问题,以及如何正确构建类似下面使用的自定义推断脚本?

一些背景信息:

  • 对于问答任务,Instructor模型接受以下输入:指令(例如,“代表检索医学标题”)和查询字符串(例如,“医学论文的标题”)。
  • 因此,在使用Instructor生成推断时,我希望能够传递指令和查询。
  • 据我了解,这意味着我不能使用允许直接从Hugging Face部署模型的便捷工具(因为在这种方法中没有空间来自定义指令和查询)。

目前,使用Hugging Face的这个指南作为参考,我已经设置如下。

我下载了Instructor存储库,创建了一个code子目录,并在该子目录中创建了inference.pyrequirements.txt

inference.py包含:

from InstructorEmbedding import INSTRUCTOR
import torch

def model_fn(model_dir):
    model = INSTRUCTOR('hkunlp/instructor-xl')
    model.max_seq_length = 768
    
    return model


def input_fn(input_data, content_type):
    return input_data


def output_fn(prediction, accept):
    return prediction


def predict_fn(processed_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    data = processed_data['data']
    embedding_instruction = processed_data['embedding_instruction']
    
    documents = data['documents']
    metadatas = data['metadatas']
    ids = data['ids']
    
    embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)

    # 返回字典,它将被JSON序列化
    return {"embeddings": embeddings.tolist(), 'metadatas': metadatas, 'ids': ids}

requirements.txt包含:

InstructorEmbedding>=1.0.1
transformers==4.20.0
datasets>=2.2.0
jsonlines
numpy
requests>=2.26.0
scikit_learn>=1.0.2
scipy
sentence_transformers>=2.2.0
torch
tqdm
rich

然后,我从模型目录运行以下命令,以打包模型和自定义推断代码并将其上传到S3存储桶:

!tar zcvf model.tar.gz *

!aws s3 cp model.tar.gz $s3_location

最后,为了创建模型并部署实时推断端点,我运行了以下代码:

from sagemaker.huggingface.model import HuggingFaceModel

# 创建Hugging Face模型类
huggingface_model = HuggingFaceModel(
    model_data=model_s3_location,      # 模型和脚本的路径
    role=role,                         # 具有创建端点权限的IAM角色
    transformers_version="4.26",       # 使用的transformers版本
    pytorch_version="1.13",            # 使用的PyTorch版本
    py_version='py39',                 # 使用的Python版本
)

hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.r5.xlarge")

query = 'Which paper discuss anemia?'

data = {'data': {'documents': [query],
                 'metadatas': ['NONE'],
                 'ids': ['NONE']},
        'embeding_instruction': "Represent the Medical title for retrieving relevant documents:"}

hf_predictor.predict(data)

上述代码引发了以下错误:

调用InvokeEndpoint操作时发生错误(ModelError):从主机接收客户端错误(400),消息为:“{
  "code": 400,
  "type": "InternalServerException",
  "message": ""
}

查看日志,似乎Instructor安装成功:
成功安装InstructorEmbedding-1.0.1 ...

然后,有可能与三个异常中的这些部分相关。根据以下信息,似乎inference.py正在被读取,但我不确定其他部分的含义。

第一个异常:

  • 2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
  • 2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
  • 2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address

第二个异常:

  • 2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError

第三个异常:

  • 2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
  • `2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLife
英文:

I'm trying to use AWS SageMaker to get inferences from the Instructor model (i.e., to generate embeddings based on the Instructor model) but am running into various roadblocks upon calling predict() on the model class in SageMaker. What might be the issue with the below setup, and what is the appropriate way to structure a custom inference script like the one used below?

Some context:

  • For question answering tasks, the Instructor model takes the following inputs: an instruction (e.g., "Represent the Medical title for retrieving") and a query string (e.g., "The title of a medical paper").
  • Thus, when generating inferences using Instructor, I want to be able to pass in both an instruction and a query.
  • As far as I understand, this means that I cannot use the nice utility that allows you to deploy models straight from Hugging Face (because there is no room in this approach for customizing instructions + queries).

Currently, using this guide from Hugging Face as reference, I've set things up as below.

I download the Instructor repository, create a code subdirectory, and create inference.py and requirements.txt in that subdirectory.

inference.py contains:

from InstructorEmbedding import INSTRUCTOR
import torch

def model_fn(model_dir):
    model = INSTRUCTOR('hkunlp/instructor-xl')
    model.max_seq_length = 768
    
    return model


def input_fn(input_data, content_type):
    return input_data


def output_fn(prediction, accept):
    return prediction


def predict_fn(processed_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    data = processed_data['data']
    embedding_instruction = processed_data['embedding_instruction']
    
    documents = data['documents']
    metadatas = data['metadatas']
    ids = data['ids']
    
    embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)

    # return dictonary, which will be json serializable
    return {"embeddings": embeddings.tolist(), 'metadatas': metadatas, 'ids': ids}

requirements.txt contains:

InstructorEmbedding>=1.0.1
transformers==4.20.0
datasets>=2.2.0
jsonlines
numpy
requests>=2.26.0
scikit_learn>=1.0.2
scipy
sentence_transformers>=2.2.0
torch
tqdm
rich

I then run the following from the model directory to package the model and custom inference code and upload it to an S3 bucket:

!tar zcvf model.tar.gz *

!aws s3 cp model.tar.gz $s3_location

Finally, to create the model and deploy a realtime inference endpoint I run:

from sagemaker.huggingface.model import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=model_s3_location,      # path to model and script
    role=role,                         # iam role with permissions to create an endpoint
    transformers_version="4.26",       # transformers version used
    pytorch_version="1.13",            # pytorch version used
    py_version='py39',                 # python version used
)

hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.r5.xlarge")

query = 'Which paper discuss anemia?'

data = {'data': {'documents': [query],
                 'metadatas': ['NONE'],
                 'ids': ['NONE']},
        'embeding_instruction': "Represent the Medical title for retrieving relevant documents:",}

hf_predictor.predict(data)

The above throws the following error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": ""
}

Looking into the logs, it appears that Instruct installs fine:
Successfully installed InstructorEmbedding-1.0.1 ...

And then there are these possibly relevant parts from three exceptions. Based on the below, it looks like inference.py is being read, but I'm not sure what to make of the other bits.

First exception:

  • 2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
  • 2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
  • 2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
  • 2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address

Second exception:

  • 2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError

Third exception:

  • 2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
  • 2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
  • 2023-06-12T00:55:14,741 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e), 400)

答案1

得分: 3

After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:

  • input_fn() receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case, json.loads() would do the trick.
  • Relatedly, I believe output_fn() needs to implement some kind of encoding in order for the data to be streamed back to the user.
  • While I packaged the Instructor model, I never actually used it... Instead, in model_fn() I simply initialized INSTRUCTOR, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the .tag.gz file, I would have to implement the Instructor encode() method within model_fn().

Ultimately, Instructor is only marginally better than a model like E5-large-v2, but E5-large-v2 can be implemented in a straightforward fashion, so I'll be switching to using that model in the convenient way described in this guide from Hugging Face.

A few related that may also be useful:

英文:

After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:

  • input_fn() receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case, json.loads() would do the trick.
  • Relatedly, I believe output_fn() needs to implement some kind of encoding in order for the data to be streamed back to the user.
  • While I packaged the Instructor model, I never actually used it... Instead, in model_fn() I simply initialized INSTRUCTOR, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the .tag.gz file, I would have to implement the Instructor encode() method within model_fn().

Ultimately, Instructor is only marginally better than a model like E5-large-v2, but E5-large-v2 can be implemented in a straightforward fashion, so I'll be switching to using that model in the convenient way described in this guide from Hugging Face.

A few related that may also be useful:

huangapple
  • 本文由 发表于 2023年6月12日 09:41:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76453212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定