指定Huggingface模型作为项目依赖。

huangapple go评论81阅读模式
英文:

Specifying Huggingface model as project dependency

问题

在项目依赖项中安装huggingface模型是否可行?

目前,SentenceTransformer库会自动下载它,但这意味着在Docker容器中每次启动时都会下载。

这是我尝试使用的模型链接:https://huggingface.co/sentence-transformers/all-mpnet-base-v2

我已经尝试在我的pyproject.toml文件中将URL指定为依赖项:

all-mpnet-base-v2 = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

我首先遇到的错误是名称不正确,应该称为train-script,于是我将依赖项重命名为如下:

train-script = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

然而,现在我遇到以下错误:

 Package operations: 1 install, 0 updates, 0 removals

   • Installing train-script (0.0.0 bd44305)

   EnvCommandError

   Command ['/srv/.venv/bin/pip', 'install', '--no-deps', '-U', '/srv/.venv/src/train-script'] errored with the following return code 1, and output:
   ERROR: Directory '/srv/.venv/src/train-script' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

   [notice] A new release of pip available: 22.2.2 -> 22.3.1
   [notice] To update, run: pip install --upgrade pip


   at /usr/local/lib/python3.10/site-packages/poetry/utils/env.py:1183 in _run
       1179│                 output = subprocess.check_output(
       1180│                     cmd, stderr=subprocess.STDOUT, **kwargs
       1181│                 )
       1182│         except CalledProcessError as e:
     → 1183│             raise EnvCommandError(e, input=input_)
       1184│
       1185│         return decode(output)
       1186│
       1187│     def execute(self, bin, *args, **kwargs):

这是否可行?如果不行,是否有一种推荐的方式将模型下载嵌入Docker容器中,以便每次不需要下载它?

英文:

Is it possible to install huggingface models as a project dependency?

Currently it is downloaded automatically by the SentenceTransformer library, but this means in a docker container it downloads every time it starts.

This is the model I am trying to use: https://huggingface.co/sentence-transformers/all-mpnet-base-v2

I have tried specifying the url as a dependency in my pyproject.toml:

all-mpnet-base-v2 = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

The first error I got was that the name was incorrect and it should be called train-script, which I renamed the dependency to, but I'm not sure if this is correct. Now I have:

train-script = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}

However, now I get the following error:

 Package operations: 1 install, 0 updates, 0 removals

   • Installing train-script (0.0.0 bd44305)

   EnvCommandError

   Command ['/srv/.venv/bin/pip', 'install', '--no-deps', '-U', '/srv/.venv/src/train-script'] errored with the following return code 1, and output:
   ERROR: Directory '/srv/.venv/src/train-script' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

   [notice] A new release of pip available: 22.2.2 -> 22.3.1
   [notice] To update, run: pip install --upgrade pip


   at /usr/local/lib/python3.10/site-packages/poetry/utils/env.py:1183 in _run
       1179│                 output = subprocess.check_output(
       1180│                     cmd, stderr=subprocess.STDOUT, **kwargs
       1181│                 )
       1182│         except CalledProcessError as e:
     → 1183│             raise EnvCommandError(e, input=input_)
       1184│
       1185│         return decode(output)
       1186│
       1187│     def execute(self, bin, *args, **kwargs):

Is this possible? If not, is there a recommended way to bake the model download into a docker container so it doesn't need to be downloaded each time?

答案1

得分: 1

我找不到一种原生方法来使用项目依赖文件完成这项任务,所以我使用了一个多阶段的Docker文件来实现。

首先,我在本地克隆了模型,然后将其复制到适当的/root/.cache/torch/文件夹中。

以下是一个示例:

FROM python:3.10.3 as model-download-stage

RUN apt update && apt install git-lfs -y

RUN git lfs install

RUN git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2 /tmp/model
RUN rm -rf /tmp/model/.git

FROM python:3.10.3

COPY --from=model-download-stage /tmp/model /root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2
英文:

I was not able to find a native way to do this with project dependency files, so I did this using a multi-stage docker file.

First I clone the model locally, then copy it into the appropriate /root/.cache/torch/ folder.

Here is an example:

FROM python:3.10.3 as model-download-stage

RUN apt update && apt install git-lfs -y

RUN git lfs install

RUN git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2 /tmp/model
RUN rm -rf /tmp/model/.git

FROM python:3.10.3

COPY --from=model-download-stage /tmp/model /root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2

huangapple
  • 本文由 发表于 2023年1月9日 08:13:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定