英文:
Specifying Huggingface model as project dependency
问题
在项目依赖项中安装huggingface模型是否可行?
目前,SentenceTransformer
库会自动下载它,但这意味着在Docker容器中每次启动时都会下载。
这是我尝试使用的模型链接:https://huggingface.co/sentence-transformers/all-mpnet-base-v2
我已经尝试在我的pyproject.toml
文件中将URL指定为依赖项:
all-mpnet-base-v2 = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}
我首先遇到的错误是名称不正确,应该称为train-script
,于是我将依赖项重命名为如下:
train-script = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}
然而,现在我遇到以下错误:
Package operations: 1 install, 0 updates, 0 removals
• Installing train-script (0.0.0 bd44305)
EnvCommandError
Command ['/srv/.venv/bin/pip', 'install', '--no-deps', '-U', '/srv/.venv/src/train-script'] errored with the following return code 1, and output:
ERROR: Directory '/srv/.venv/src/train-script' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
at /usr/local/lib/python3.10/site-packages/poetry/utils/env.py:1183 in _run
1179│ output = subprocess.check_output(
1180│ cmd, stderr=subprocess.STDOUT, **kwargs
1181│ )
1182│ except CalledProcessError as e:
→ 1183│ raise EnvCommandError(e, input=input_)
1184│
1185│ return decode(output)
1186│
1187│ def execute(self, bin, *args, **kwargs):
这是否可行?如果不行,是否有一种推荐的方式将模型下载嵌入Docker容器中,以便每次不需要下载它?
英文:
Is it possible to install huggingface models as a project dependency?
Currently it is downloaded automatically by the SentenceTransformer
library, but this means in a docker container it downloads every time it starts.
This is the model I am trying to use: https://huggingface.co/sentence-transformers/all-mpnet-base-v2
I have tried specifying the url as a dependency in my pyproject.toml
:
all-mpnet-base-v2 = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}
The first error I got was that the name was incorrect and it should be called train-script
, which I renamed the dependency to, but I'm not sure if this is correct. Now I have:
train-script = {git = "https://huggingface.co/sentence-transformers/all-mpnet-base-v2.git", branch = "main"}
However, now I get the following error:
Package operations: 1 install, 0 updates, 0 removals
• Installing train-script (0.0.0 bd44305)
EnvCommandError
Command ['/srv/.venv/bin/pip', 'install', '--no-deps', '-U', '/srv/.venv/src/train-script'] errored with the following return code 1, and output:
ERROR: Directory '/srv/.venv/src/train-script' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
[notice] A new release of pip available: 22.2.2 -> 22.3.1
[notice] To update, run: pip install --upgrade pip
at /usr/local/lib/python3.10/site-packages/poetry/utils/env.py:1183 in _run
1179│ output = subprocess.check_output(
1180│ cmd, stderr=subprocess.STDOUT, **kwargs
1181│ )
1182│ except CalledProcessError as e:
→ 1183│ raise EnvCommandError(e, input=input_)
1184│
1185│ return decode(output)
1186│
1187│ def execute(self, bin, *args, **kwargs):
Is this possible? If not, is there a recommended way to bake the model download into a docker container so it doesn't need to be downloaded each time?
答案1
得分: 1
我找不到一种原生方法来使用项目依赖文件完成这项任务,所以我使用了一个多阶段的Docker文件来实现。
首先,我在本地克隆了模型,然后将其复制到适当的/root/.cache/torch/
文件夹中。
以下是一个示例:
FROM python:3.10.3 as model-download-stage
RUN apt update && apt install git-lfs -y
RUN git lfs install
RUN git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2 /tmp/model
RUN rm -rf /tmp/model/.git
FROM python:3.10.3
COPY --from=model-download-stage /tmp/model /root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2
英文:
I was not able to find a native way to do this with project dependency files, so I did this using a multi-stage docker file.
First I clone the model locally, then copy it into the appropriate /root/.cache/torch/
folder.
Here is an example:
FROM python:3.10.3 as model-download-stage
RUN apt update && apt install git-lfs -y
RUN git lfs install
RUN git clone https://huggingface.co/sentence-transformers/all-mpnet-base-v2 /tmp/model
RUN rm -rf /tmp/model/.git
FROM python:3.10.3
COPY --from=model-download-stage /tmp/model /root/.cache/torch/sentence_transformers/sentence-transformers_all-mpnet-base-v2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论