更快的方式使用已保存的PyTorch模型(绕过import torch?)

huangapple go评论59阅读模式
英文:

Faster way to use saved Pytorch model (bypassing import torch?)

问题

I'm using a Slurm Workload Manager on a server and import torch takes around 30-40 seconds. The IT people running it said they couldn't do much to improve it and it was just hardware related (maybe they missed something? but I've gone through the internet before asking them and couldn't find much either). By comparison, import numpy takes around 1 second.

我正在使用服务器上的Slurm工作负载管理器,导入torch大约需要30-40秒。运行它的IT人员表示他们无法做太多改进,这可能仅与硬件有关(也许他们漏掉了什么?但在向他们提问之前,我已经在互联网上搜索了很多,也没有找到太多信息)。相比之下,导入numpy只需要大约1秒。

I would like to know if there is a way to use the saved weights of a pytorch model to ONLY predict an output with a given input without importing torch (so no need to import everything related to gradients, etc ...). Theoretically, it is just matrix multiplications (I think?) so it probably is feasible by only using numpy? I need to do this several times on different jobs so I cannot cache / pass around the imported torch which is why I'm actively looking for a solution (but generally speaking taking something from 30-40 seconds to a few is pretty cool anyway).

我想知道是否有一种方法可以使用pytorch模型的保存权重仅使用给定输入来预测输出,而无需导入torch(因此无需导入与梯度等相关的所有内容...)。从理论上讲,这只是矩阵乘法(我认为?),所以只使用numpy可能是可行的?我需要在不同的作业上多次执行此操作,因此无法缓存/传递导入的torch,这就是为什么我正在积极寻找解决方案的原因(但一般来说,将某物从30-40秒降低到几秒钟非常酷)。

If that matters, here is the architecture of my model:

如果有关系的话,这是我的模型架构:

ActionNN(
  (conv_1): Conv2d(5, 16, kernel_size=(3, 3), stride=(1, 1))
  (conv_2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv_3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (norm_layer_1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (norm_layer_2): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (norm_layer_3): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (gap): AdaptiveAvgPool2d(output_size=(1, 1))
  (mlp): Sequential(
    (0): Linear(in_features=71, out_features=128, bias=True)
    (1): ReLU()
  )
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (sigmoid): Sigmoid()
  (tanh): Tanh()
)
Number of parameters: 152284

If it was only fully connected layers, it would be "pretty easy" but because my network is a tiny bit more complex, I'm not sure how I should do it.

如果只有全连接层,那就会很“容易”,但由于我的网络稍微复杂一些,我不确定该如何操作。

I saved the parameters using torch.save(my_network.state_dict(), my_path).

我使用torch.save(my_network.state_dict(), my_path)保存了参数。

Since my script takes in total on average 35 seconds (import torch included), I would be able to run it in on average a second or two, which would be great.

由于我的脚本总共需要平均35秒(包括import torch),我应该能够平均在1到2秒内运行它,这将非常好。

英文:

I'm using a Slurm Workload Manager on a server and import torch takes around 30-40 seconds. The IT people running it said they couldn't do much to improve it and it was just hardware related (maybe they missed something? but i've gone through the internet before asking them and couldn't find much either). By comparison, import numpy takes around 1 second.

I would like to know if there is a way to use the saved weights of a pytorch model to ONLY predict an output with a given input without importing torch (so no need to import everything related to gradients, etc ...). Theoretically, it is just matrix multiplications (I think?) so it probably is feasible by only using numpy? I need to do this several times on different jobs so I cannot cache / pass around the imported torch which is why I'm actively looking for a solution (but generally speaking taking something from 30-40 seconds to a few is pretty cool anyway).

If that matters, here is the architecture of my model:

ActionNN(
  (conv_1): Conv2d(5, 16, kernel_size=(3, 3), stride=(1, 1))
  (conv_2): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv_3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (norm_layer_1): InstanceNorm2d(16, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (norm_layer_2): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (norm_layer_3): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
  (gap): AdaptiveAvgPool2d(output_size=(1, 1))
  (mlp): Sequential(
    (0): Linear(in_features=71, out_features=128, bias=True)
    (1): ReLU()
  )
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=210, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (layer): Linear(in_features=128, out_features=28, bias=True)
  (sigmoid): Sigmoid()
  (tanh): Tanh()
)
Number of parameters: 152284

If it was only fully connected layers, it would be "pretty easy" but because my network is a tiny bit more complex, I'm not sure how I should do it.

I saved the parameters using torch.save(my_network.state_dict(), my_path).

Since my script takes in total on average 35 seconds (import torch included), I would be able to run it in on average a second or two, which would be great.

Here is my profiling of import torch:

         1226310 function calls (1209639 primitive calls) in 49.994 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1273   21.590    0.017   21.590    0.017 {method 'read' of '_io.BufferedReader' objects}
     5276   12.145    0.002   12.145    0.002 {built-in method posix.stat}
     1273    7.427    0.006    7.427    0.006 {built-in method io.open_code}
    45/25    5.631    0.125    9.939    0.398 {built-in method _imp.create_dynamic}
        2    0.564    0.282    0.564    0.282 {built-in method _ctypes.dlopen}
     1273    0.288    0.000    0.288    0.000 {built-in method marshal.loads}
       17    0.286    0.017    0.286    0.017 {method 'readline' of '_io.BufferedReader' objects}
2809/2753    0.098    0.000    0.546    0.000 {built-in method builtins.__build_class__}
   1620/1    0.062    0.000   49.997   49.997 {built-in method builtins.exec}
    50145    0.051    0.000    0.119    0.000 {built-in method builtins.getattr}
     1159    0.048    0.000    0.115    0.000 inspect.py:3245(signature)
      424    0.048    0.000    0.113    0.000 assumptions.py:596(__init__)
       13    0.039    0.003    0.039    0.003 {built-in method io.open}
     1411    0.035    0.000    0.045    0.000 library.py:71(impl)
     1663    0.034    0.000   12.209    0.007 <frozen importlib._bootstrap_external>:1536(find_spec)

答案1

得分: 2

这是已翻译好的部分:

有一种简单的方法可以节省导入时间,那就是启动一个服务器,仅在启动时导入torch一次,并仅加载模型一次。

使用Flask或者更好的FastAPI,启动一个简单的HTTP服务器,在HTTP调用上运行脚本。

服务器将需要40秒才能启动,但然后任何推断调用将仅需连接和运行推断的时间。

from fastapi import Request, FastAPI
import torch

model = torch.load(<你的模型在这里>)

app = FastAPI()

@app.post("/predict")
async def inference(request: Request):
    input = request.json()
    prediction = model.predict(input)
    return {"predictions": predictions}

通过将数据发送到http://<你的主机>:<端口>/predict来使用任何客户端调用服务器。

更多详情请参阅https://fastapi.tiangolo.com/tutorial/first-steps/。

英文:

There is an easy way to save up on import time, it's to spin up a server, and import torch only once at start up and load the model once only.

Use Flask or better yet FastAPI, and spin up a simple HTTP server that will run the script on an HTTP call.

The server will take 40 seconds to start, but then any inference call will take just the time to connect and run inference.

from fastapi import Request, FastAPI
import torch

model = torch.load(&lt;yourmodel here&gt;)

app = FastAPI()

@app.post(&quot;/predict&quot;)
async def inference(request: Request):
    input = request.json()
    prediction = model.predict(input)
    return {&quot;predictions&quot;: predictions}

call the server with whatever client by posting data to http://&lt;your-host&gt;:&lt;post&gt;/predict

see https://fastapi.tiangolo.com/tutorial/first-steps/ for more details.

huangapple
  • 本文由 发表于 2023年6月6日 04:59:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76409959.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定