2023年6月12日 09:41:31go评论122阅读模式

英文:

Use Hugging Face Instructor with AWS Sagemaker

问题

我正试图使用AWS SageMaker从Instructor模型获取推断（即根据Instructor模型生成嵌入），但在SageMaker的模型类上调用predict()时遇到了各种障碍。下面的设置可能存在什么问题，以及如何正确构建类似下面使用的自定义推断脚本？

一些背景信息：

对于问答任务，Instructor模型接受以下输入：指令（例如，“代表检索医学标题”）和查询字符串（例如，“医学论文的标题”）。
因此，在使用Instructor生成推断时，我希望能够传递指令和查询。
据我了解，这意味着我不能使用允许直接从Hugging Face部署模型的便捷工具（因为在这种方法中没有空间来自定义指令和查询）。

目前，使用Hugging Face的这个指南作为参考，我已经设置如下。

我下载了Instructor存储库，创建了一个code子目录，并在该子目录中创建了inference.py和requirements.txt。

inference.py包含：

from InstructorEmbedding import INSTRUCTOR
import torch
def model_fn(model_dir):
    model = INSTRUCTOR('hkunlp/instructor-xl')
    model.max_seq_length = 768
    
    return model
def input_fn(input_data, content_type):
    return input_data
def output_fn(prediction, accept):
    return prediction
def predict_fn(processed_data, model):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    data = processed_data['data']
    embedding_instruction = processed_data['embedding_instruction']
    
    documents = data['documents']
    metadatas = data['metadatas']
    ids = data['ids']
    
    embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)
    # 返回字典，它将被JSON序列化
    return {"embeddings": embeddings.tolist(), 'metadatas': metadatas, 'ids': ids}

requirements.txt包含：

InstructorEmbedding>=1.0.1
transformers==4.20.0
datasets>=2.2.0
jsonlines
numpy
requests>=2.26.0
scikit_learn>=1.0.2
scipy
sentence_transformers>=2.2.0
torch
tqdm
rich

然后，我从模型目录运行以下命令，以打包模型和自定义推断代码并将其上传到S3存储桶：

!tar zcvf model.tar.gz *
!aws s3 cp model.tar.gz $s3_location

最后，为了创建模型并部署实时推断端点，我运行了以下代码：

from sagemaker.huggingface.model import HuggingFaceModel
# 创建Hugging Face模型类
huggingface_model = HuggingFaceModel(
    model_data=model_s3_location,      # 模型和脚本的路径
    role=role,                         # 具有创建端点权限的IAM角色
    transformers_version="4.26",       # 使用的transformers版本
    pytorch_version="1.13",            # 使用的PyTorch版本
    py_version='py39',                 # 使用的Python版本
)
hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type="ml.r5.xlarge")
query = 'Which paper discuss anemia?'
data = {'data': {'documents': [query],
                 'metadatas': ['NONE'],
                 'ids': ['NONE']},
        'embeding_instruction': "Represent the Medical title for retrieving relevant documents:"}
hf_predictor.predict(data)

上述代码引发了以下错误：

调用InvokeEndpoint操作时发生错误（ModelError）：从主机接收客户端错误（400），消息为：“{
  "code": 400,
  "type": "InternalServerException",
  "message": ""
}

查看日志，似乎Instructor安装成功：
成功安装InstructorEmbedding-1.0.1 ...

然后，有可能与三个异常中的这些部分相关。根据以下信息，似乎inference.py正在被读取，但我不确定其他部分的含义。

第一个异常：

2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address

第二个异常：

2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError

第三个异常：

2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
`2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLife

英文:

I'm trying to use AWS SageMaker to get inferences from the Instructor model (i.e., to generate embeddings based on the Instructor model) but am running into various roadblocks upon calling predict() on the model class in SageMaker. What might be the issue with the below setup, and what is the appropriate way to structure a custom inference script like the one used below?

Some context:

For question answering tasks, the Instructor model takes the following inputs: an instruction (e.g., "Represent the Medical title for retrieving") and a query string (e.g., "The title of a medical paper").
Thus, when generating inferences using Instructor, I want to be able to pass in both an instruction and a query.
As far as I understand, this means that I cannot use the nice utility that allows you to deploy models straight from Hugging Face (because there is no room in this approach for customizing instructions + queries).

Currently, using this guide from Hugging Face as reference, I've set things up as below.

I download the Instructor repository, create a code subdirectory, and create inference.py and requirements.txt in that subdirectory.

inference.py contains:

from InstructorEmbedding import INSTRUCTOR
import torch
def model_fn(model_dir):
    model = INSTRUCTOR(&#39;hkunlp/instructor-xl&#39;)
    model.max_seq_length = 768
    
    return model
def input_fn(input_data, content_type):
    return input_data
def output_fn(prediction, accept):
    return prediction
def predict_fn(processed_data, model):
    device = torch.device(&quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;)
    
    data = processed_data[&#39;data&#39;]
    embedding_instruction = processed_data[&#39;embedding_instruction&#39;]
    
    documents = data[&#39;documents&#39;]
    metadatas = data[&#39;metadatas&#39;]
    ids = data[&#39;ids&#39;]
    
    embeddings = model.encode([[embedding_instruction, doc] for d in documents], device=device)
    # return dictonary, which will be json serializable
    return {&quot;embeddings&quot;: embeddings.tolist(), &#39;metadatas&#39;: metadatas, &#39;ids&#39;: ids}

requirements.txt contains:

InstructorEmbedding&gt;=1.0.1
transformers==4.20.0
datasets&gt;=2.2.0
jsonlines
numpy
requests&gt;=2.26.0
scikit_learn&gt;=1.0.2
scipy
sentence_transformers&gt;=2.2.0
torch
tqdm
rich

I then run the following from the model directory to package the model and custom inference code and upload it to an S3 bucket:

!tar zcvf model.tar.gz *
!aws s3 cp model.tar.gz $s3_location

Finally, to create the model and deploy a realtime inference endpoint I run:

from sagemaker.huggingface.model import HuggingFaceModel
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=model_s3_location,      # path to model and script
    role=role,                         # iam role with permissions to create an endpoint
    transformers_version=&quot;4.26&quot;,       # transformers version used
    pytorch_version=&quot;1.13&quot;,            # pytorch version used
    py_version=&#39;py39&#39;,                 # python version used
)
hf_predictor = huggingface_model.deploy(initial_instance_count=1, instance_type=&quot;ml.r5.xlarge&quot;)
query = &#39;Which paper discuss anemia?&#39;
data = {&#39;data&#39;: {&#39;documents&#39;: [query],
                 &#39;metadatas&#39;: [&#39;NONE&#39;],
                 &#39;ids&#39;: [&#39;NONE&#39;]},
        &#39;embeding_instruction&#39;: &quot;Represent the Medical title for retrieving relevant documents:&quot;,}
hf_predictor.predict(data)

The above throws the following error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message &quot;{
  &quot;code&quot;: 400,
  &quot;type&quot;: &quot;InternalServerException&quot;,
  &quot;message&quot;: &quot;&quot;
}

Looking into the logs, it appears that Instruct installs fine:
Successfully installed InstructorEmbedding-1.0.1 ...

And then there are these possibly relevant parts from three exceptions. Based on the below, it looks like inference.py is being read, but I'm not sure what to make of the other bits.

First exception:

2023-06-12T00:55:14,736 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - load INSTRUCTOR_Transformer
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Traceback (most recent call last):
2023-06-12T00:55:14,737 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 461, in load_state_dict
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/torch/serialization.py", line 1079, in load_tensor
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
2023-06-12T00:55:14,738 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - OSError: [Errno 14] Bad address

Second exception:

2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MemoryError

Third exception:

2023-06-12T00:55:14,739 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/ml/model/code/inference.py", line 5, in model_fn
2023-06-12T00:55:14,740 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - File "/opt/conda/lib/python3.9/site-packages/sagemaker_huggingface_inference_toolkit/handler_service.py", line 243, in handle
2023-06-12T00:55:14,741 [INFO ] W-model-2-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - raise PredictionException(str(e), 400)

答案1

得分: 3

After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:

input_fn() receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case, json.loads() would do the trick.
Relatedly, I believe output_fn() needs to implement some kind of encoding in order for the data to be streamed back to the user.
While I packaged the Instructor model, I never actually used it... Instead, in model_fn() I simply initialized INSTRUCTOR, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the .tag.gz file, I would have to implement the Instructor encode() method within model_fn().

Ultimately, Instructor is only marginally better than a model like E5-large-v2, but E5-large-v2 can be implemented in a straightforward fashion, so I'll be switching to using that model in the convenient way described in this guide from Hugging Face.

A few related that may also be useful:

英文:

After much more digging around, I realized there were several things going quite wrong in the above implementation. In case it helps anyone in the future, a non-exhaustive list follows:

input_fn() receives bytes, so must implement some kind of decoding in order for the input data to be in a workable format. In my very simple case, json.loads() would do the trick.
Relatedly, I believe output_fn() needs to implement some kind of encoding in order for the data to be streamed back to the user.
While I packaged the Instructor model, I never actually used it... Instead, in model_fn() I simply initialized INSTRUCTOR, which is imported in that same file, meaning the packaged version of the model is left completely unused. However, Instructor is a Python package, not a bare ML model. To actually use the resources packaged in the .tag.gz file, I would have to implement the Instructor encode() method within model_fn().

A few related that may also be useful:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Hugging Face Instructor与AWS Sagemaker。

问题

答案1

如何使用Ctrl+c停止multiprocessing.Pool？（Python 3.10）[已解决]

如何使用Python正确刷新AWS凭证

我想使用Python查询MongoDB。

xticks和yticks在使用matplotlib绘制散点图时没有显示一些值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。