问题

我的问题与如何部署Hugging Face模型相关。我最近下载了Falcon 7B Instruct模型，并在我的Colab中运行了它。然而，当我尝试加载模型并希望生成文本时，它需要大约40秒才能给我输出。我只是想知道如何在生产环境中部署这些模型，以便以低延迟获取输出。我对MLOps还不太熟悉，所以我只是想探索一下。此外，部署该模型将产生什么费用？如果有许多用户同时使用该模型怎么办？我该如何处理？将非常感激您的回应。

我正在使用的代码来自https://huggingface.co/tiiuae/falcon-7b-instruct。

另外，我将模型权重保存在Google Drive中。

英文:

My question is related to how one deploys the Hugging Face model. I recently downloaded the Falcon 7B Instruct model and ran it in my Colab. However, when I am trying to load the model and want it to generate text, it takes about 40 seconds to give me an output. I was just wondering how we deploy these models in production then so that it gives us output with low latency. I am new to MLOps so I just want to explore. Also, what will be the charges of deploying that model? What if many users are simultaneously using this model? How will I handle that? Will greatly appreciate the response.

The code I am using is from the <https://huggingface.co/tiiuae/falcon-7b-instruct>.

Also, I am saving the model weights locally in a Google Drive.

答案1

得分: 1

我只是想知道如何将这些模型部署到生产环境中，以便以低延迟提供输出。

您可以下载该模型并在本地使用，以避免与互联网连接相关的任何延迟。请注意，您的输入需要进行处理，因此正常情况下需要一些时间才能给您响应。为了使其尽快运行，您需要在GPU上运行它（通常比CPU快数十倍）。

此模型的部署费用会是多少？

您可以免费使用这些模型。

如果许多用户同时使用这个模型会怎么样？

我从未遇到过这种问题，但如果您想避免任何问题，可以再次下载模型并从本地加载。

英文:

I was just wondering how we deploy these models in production then so that it gives us output with low latency.

You can download the model and use it locally, to avoid any kind of latency related to Internet connection. Notice that you input has to be processed, so it is normal to take some time to give you a response. To make it as quick as possible, you have to run it on GPUs (typically dozens of times faster that CPUs).

Also, what will be the charges of deploying that model?

You can use the models for free.

What if many users are simultaneously using this model?

I have never experienced issues of this kind, but if you want to avoid any kind of problem, once again, you can download the model and load it from local.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Hugging Face 模型部署

问题

答案1

无法使用Python发送删除请求。

怎么创建一个可重复使用的函数来根据特定列中的值删除行？

Python Elasticsearch update with script exception

出现错误，不明白为什么或如何修复。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论