在PyTorch中,是否可以通过系数来冻结一个模块?

huangapple go评论68阅读模式
英文:

In PyTorch, is it possible to freeze a module by a coefficient?

问题

在我的实验中,我想首先训练一个低级模型(L),然后在一个更高级的任务(H)中重用它。通常情况下,我会在训练H时冻结模型L。但是否可能不完全冻结L,而是通过一个系数来冻结它,类似这样。

抱歉,如果这听起来在数学上不正确,但如果我们假设在非冻结模型的情况下,它受到1.0的梯度影响,而在冻结时受到0.0的影响,我希望能够变化这个系数,这样我就可以拥有不完全冻结的模块(0.0),但仍然在梯度下降的过程中部分受到影响(例如,0.1)。但仍然很重要的是,模型L完全影响H的结果。换句话说,它以1.0的比例影响结果,但在反向传播阶段,它受到0.1的影响。

这背后的主要思想是使模型L在高级任务方面略微调整。

我搜索了这个问题,但最好的答案是这两个问题,我认为它们可能含有对我的问题的提示,但我仍然无法弄清楚如何为前向和后向传递分开使用“权重”:

  1. https://discuss.pytorch.org/t/multiply-a-model-by-trainable-scalar/76308
  2. https://discuss.pytorch.org/t/different-forward-and-backward-weights/52800/10 这个似乎回答了问题,但看起来有点巧妙,也可能已经过时了。也许有更成熟和实际的方法可以实现这一点?
英文:

In my experiment, I want to first train a low-level model (L), then reuse it in a higher-level task (H). Usually I would come to freezing the model L when trainig H. But is it possible to not completely freeze L, but freeze it by a coefficient, sort of.

I'm sorry if this sounds mathematically incorrect, but if we assume that in the case of a non-frozen model it is affected by gradient at scale of 1.0, and when it's frozen it is affected by 0.0, I would love to be able to vary this coefficient, so I could have not a completely frozen module (0.0), but still be partially affected by the gradient descent (fro example, by 0.1). But it is still important, that model L fully affects the result of H. Or in other words, it affects the result by the scale of 1.0, but at the stage of back-propagation, it is affected by 0.1.

The main idea behind this is for model L to get slightly tuned w.r.t. to a high-level task.

I googled the question, but the best I came up with these two questions, which, I believe, could contain a hint to my question, but I still can't figure out how to have separate "weights" for forward and backward passes:

  1. https://discuss.pytorch.org/t/multiply-a-model-by-trainable-scalar/76308
  2. https://discuss.pytorch.org/t/different-forward-and-backward-weights/52800/10 This one seems to answer the question, but it seems too hacky, and maybe outdated. Maybe there are more established and actual methods of doing this?

答案1

得分: 3

根据我理解,您正在尝试为模型的不同部分指定不同的学习率。PyTorch优化器支持这个选项

optim.SGD([
              {'params': model.base.parameters()},
              {'params': model.L.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

然后,您可以像平常一样运行训练循环。

英文:

From what I understand, you're trying to specify a different learning rate for different parts of your model. Pytorch optimizers support that option directly:

optim.SGD([
              {'params': model.base.parameters()},
              {'params': model.L.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

From there, you can run a training loop as usual.

huangapple
  • 本文由 发表于 2023年5月17日 20:40:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定