英文:
Use Hugging Face Instructor with AWS Sagemaker
问题
我正试图使用AWS SageMaker从Instructor模型获取推断(即根据Instructor模型生成嵌入),但在SageMaker的模型类上调用predict()
时遇到了各种障碍。下面的设置可能存在什么问题,以及如何正确构建类似下面使用的自定义推断脚本?
一些背景信息:
- 对于问答任务,Instructor模型接受以下输入:指令(例如,“代表检索医学标题”)和查询字符串(例如,“医学论文的标题”)。
- 因此,在使用Instructor生成推断时,我希望能够传递指令和查询。
- 据我了解,这意味着我不能使用允许直接从Hugging Face部署模型的便捷工具(因为在这种方法中没有空间来自定义指令和查询)。
目前,使用Hugging Face的这个指南作为参考,我已经设置如下。
我下载了Instructor存储库,创建了一个code
子目录,并在该子目录中创建了inference.py
和requirements.txt
。
inference.py
包含:
from InstructorEmbedding import INSTRUCTOR
import torch
def model_fn(model_dir):
model = INSTRUCTOR('hkunlp/instructor-xl')
model.max_seq_length = 768
return model
def input_fn(input_data, content_type):
return input_data
def output_fn(prediction, accept):
return prediction
def predict_fn(processed_data, model):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
data = processed_data['data']
embedding_instruction = processed_data['embedding_instruction']
documents = data['documents']
metadatas = data['metadatas']
ids = data['ids']
embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)
# 返回字典,它将被JSON序列化
return {"embeddings": embeddings.tolist(), 'metadatas': metadatas, 'ids': ids}
requirements.txt
包含:
InstructorEmbedding>=1.0.1
transformers==4.20.0
datasets>=2.2.0
jsonlines
numpy
requests>=2.26.0
scikit_learn>=1.0.2
scipy
sentence_transformers>=2.2.0
torch
tqdm
rich
然后,我从模型目录运行以下命令,以打包模型和自定义推断代码并将其上传到S3存储桶:
!tar zcvf model.tar.gz *
!aws s3 cp model.tar.gz $s3_location
最后,为了创建模型并部署实时推断端点,我运行了以下代码:
from sagemaker.huggingface.model import HuggingFaceModel
# 创建Hugging Face模型类
huggingface_model = HuggingFaceModel(
model_data=model_s3_location, # 模型和脚本的路径
role=role, # 具有创建端点权限的IAM角色
transformers_version="4.26", # 使用的transformers版本
pytorch_version="1.13", # 使用的PyTorch版本
py_version='py39', # 使用的Python版本
)
hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.r5.xlarge")
query = 'Which paper discuss anemia?'
data = {'data': {'documents': [query],
'metadatas': ['NONE'],
'ids': ['NONE']},
'embeding_instruction': "Represent the Medical title for retrieving relevant documents:"}
hf_predictor.predict(data)
上述代码引发了以下错误:
调用InvokeEndpoint操作时发生错误(ModelError):从主机接收客户端错误(400),消息为:“{
"code": 400,
"type": "InternalServerException",
"message": ""
}
查看日志,似乎Instructor安装成功:
成功安装InstructorEmbedding-1.0.1 ...
然后,有可能与三个异常中的这些部分相关。根据以下信息,似乎inference.py
正在被读取,但我不确定其他部分的含义。
第一个异常:
2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address
第二个异常:
2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError
第三个异常:
2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
- `2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLife
英文:
I'm trying to use AWS SageMaker to get inferences from the Instructor model (i.e., to generate embeddings based on the Instructor model) but am running into various roadblocks upon calling predict()
on the model class in SageMaker. What might be the issue with the below setup, and what is the appropriate way to structure a custom inference script like the one used below?
Some context:
- For question answering tasks, the Instructor model takes the following inputs: an instruction (e.g., "Represent the Medical title for retrieving") and a query string (e.g., "The title of a medical paper").
- Thus, when generating inferences using Instructor, I want to be able to pass in both an instruction and a query.
- As far as I understand, this means that I cannot use the nice utility that allows you to deploy models straight from Hugging Face (because there is no room in this approach for customizing instructions + queries).
Currently, using this guide from Hugging Face as reference, I've set things up as below.
I download the Instructor repository, create a code
subdirectory, and create inference.py
and requirements.txt
in that subdirectory.
inference.py
contains:
from InstructorEmbedding import INSTRUCTOR
import torch
def model_fn(model_dir):
model = INSTRUCTOR('hkunlp/instructor-xl')
model.max_seq_length = 768
return model
def input_fn(input_data, content_type):
return input_data
def output_fn(prediction, accept):
return prediction
def predict_fn(processed_data, model):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
data = processed_data['data']
embedding_instruction = processed_data['embedding_instruction']
documents = data['documents']
metadatas = data['metadatas']
ids = data['ids']
embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)
# return dictonary, which will be json serializable
return {"embeddings": embeddings.tolist(), 'metadatas': metadatas, 'ids': ids}
requirements.txt
contains:
InstructorEmbedding>=1.0.1
transformers==4.20.0
datasets>=2.2.0
jsonlines
numpy
requests>=2.26.0
scikit_learn>=1.0.2
scipy
sentence_transformers>=2.2.0
torch
tqdm
rich
I then run the following from the model directory to package the model and custom inference code and upload it to an S3 bucket:
!tar zcvf model.tar.gz *
!aws s3 cp model.tar.gz $s3_location
Finally, to create the model and deploy a realtime inference endpoint I run:
from sagemaker.huggingface.model import HuggingFaceModel
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data=model_s3_location, # path to model and script
role=role, # iam role with permissions to create an endpoint
transformers_version="4.26", # transformers version used
pytorch_version="1.13", # pytorch version used
py_version='py39', # python version used
)
hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.r5.xlarge")
query = 'Which paper discuss anemia?'
data = {'data': {'documents': [query],
'metadatas': ['NONE'],
'ids': ['NONE']},
'embeding_instruction': "Represent the Medical title for retrieving relevant documents:",}
hf_predictor.predict(data)
The above throws the following error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": ""
}
Looking into the logs, it appears that Instruct installs fine:
Successfully installed InstructorEmbedding-1.0.1 ...
And then there are these possibly relevant parts from three exceptions. Based on the below, it looks like inference.py
is being read, but I'm not sure what to make of the other bits.
First exception:
2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address
Second exception:
2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError
Third exception:
2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
2023-06-12T00:55:14,741 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e), 400)
答案1
得分: 3
After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:
input_fn()
receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case,json.loads()
would do the trick.- Relatedly, I believe
output_fn()
needs to implement some kind of encoding in order for the data to be streamed back to the user. - While I packaged the Instructor model, I never actually used it... Instead, in
model_fn()
I simply initializedINSTRUCTOR
, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the.tag.gz
file, I would have to implement the Instructorencode()
method withinmodel_fn()
.
Ultimately, Instructor is only marginally better than a model like E5-large-v2
, but E5-large-v2
can be implemented in a straightforward fashion, so I'll be switching to using that model in the convenient way described in this guide from Hugging Face.
A few related that may also be useful:
- https://stackoverflow.com/questions/71340893/when-i-get-a-prediction-from-sagemaker-endpoint-what-does-the-endpoint-do
- https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#bring-your-own-model
- https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html
- https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-notebooks.html
- https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb
- https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html
- https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html
英文:
After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:
input_fn()
receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case,json.loads()
would do the trick.- Relatedly, I believe
output_fn()
needs to implement some kind of encoding in order for the data to be streamed back to the user. - While I packaged the Instructor model, I never actually used it... Instead, in
model_fn()
I simply initializedINSTRUCTOR
, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the.tag.gz
file, I would have to implement the Instructorencode()
method withinmodel_fn()
.
Ultimately, Instructor is only marginally better than a model like E5-large-v2
, but E5-large-v2
can be implemented in a straightforward fashion, so I'll be switching to using that model in the convenient way described in this guide from Hugging Face.
A few related that may also be useful:
-
https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html
-
https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-notebooks.html
-
https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html
-
https://docs.aws.amazon.com/sagemaker/latest/dg/prebuilt-containers-extend.html
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论