2023年5月17日 20:40:05go评论103阅读模式

英文:

In PyTorch, is it possible to freeze a module by a coefficient?

问题

在我的实验中，我想首先训练一个低级模型（L），然后在一个更高级的任务（H）中重用它。通常情况下，我会在训练H时冻结模型L。但是否可能不完全冻结L，而是通过一个系数来冻结它，类似这样。

抱歉，如果这听起来在数学上不正确，但如果我们假设在非冻结模型的情况下，它受到1.0的梯度影响，而在冻结时受到0.0的影响，我希望能够变化这个系数，这样我就可以拥有不完全冻结的模块（0.0），但仍然在梯度下降的过程中部分受到影响（例如，0.1）。但仍然很重要的是，模型L完全影响H的结果。换句话说，它以1.0的比例影响结果，但在反向传播阶段，它受到0.1的影响。

这背后的主要思想是使模型L在高级任务方面略微调整。

我搜索了这个问题，但最好的答案是这两个问题，我认为它们可能含有对我的问题的提示，但我仍然无法弄清楚如何为前向和后向传递分开使用“权重”：

https://discuss.pytorch.org/t/multiply-a-model-by-trainable-scalar/76308
https://discuss.pytorch.org/t/different-forward-and-backward-weights/52800/10 这个似乎回答了问题，但看起来有点巧妙，也可能已经过时了。也许有更成熟和实际的方法可以实现这一点？

英文:

In my experiment, I want to first train a low-level model (L), then reuse it in a higher-level task (H). Usually I would come to freezing the model L when trainig H. But is it possible to not completely freeze L, but freeze it by a coefficient, sort of.

I'm sorry if this sounds mathematically incorrect, but if we assume that in the case of a non-frozen model it is affected by gradient at scale of 1.0, and when it's frozen it is affected by 0.0, I would love to be able to vary this coefficient, so I could have not a completely frozen module (0.0), but still be partially affected by the gradient descent (fro example, by 0.1). But it is still important, that model L fully affects the result of H. Or in other words, it affects the result by the scale of 1.0, but at the stage of back-propagation, it is affected by 0.1.

The main idea behind this is for model L to get slightly tuned w.r.t. to a high-level task.

I googled the question, but the best I came up with these two questions, which, I believe, could contain a hint to my question, but I still can't figure out how to have separate "weights" for forward and backward passes:

https://discuss.pytorch.org/t/multiply-a-model-by-trainable-scalar/76308
https://discuss.pytorch.org/t/different-forward-and-backward-weights/52800/10 This one seems to answer the question, but it seems too hacky, and maybe outdated. Maybe there are more established and actual methods of doing this?

答案1

得分: 3

根据我理解，您正在尝试为模型的不同部分指定不同的学习率。PyTorch优化器支持这个选项：

optim.SGD([
              {'params': model.base.parameters()},
              {'params': model.L.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)

然后，您可以像平常一样运行训练循环。

英文:

From what I understand, you're trying to specify a different learning rate for different parts of your model. Pytorch optimizers support that option directly:

optim.SGD([
              {&#39;params&#39;: model.base.parameters()},
              {&#39;params&#39;: model.L.parameters(), &#39;lr&#39;: 1e-3}
            ], lr=1e-2, momentum=0.9)

From there, you can run a training loop as usual.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在PyTorch中，是否可以通过系数来冻结一个模块？

问题

答案1

尝试通过Python中的Google Sheets API将数据附加到Google表格

Pdf文件在前端未显示。

如何获取pandas数据框中每行的第二大值

`summary_col`为什么忽略了`info_dict`参数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。