2023年3月31日 23:09:45go评论97阅读模式

英文:

Loss function giving nan in pytorch

问题

在PyTorch中，我有一个损失函数，其中包括 1/x 以及其他一些项。我的神经网络的最后一层是一个Sigmoid层，因此值将介于0和1之间。

由于我的损失函数变成了这样，必须让传递给 1/x 的某些值在某个点变得非常小：

损失：11.047459  [729600/235474375]
损失：9.348356  [731200/235474375]
损失：7.184393  [732800/235474375]
损失：8.699876  [734400/235474375]
损失：7.178806  [736000/235474375]
损失：8.090066  [737600/235474375]
损失：12.415799  [739200/235474375]
损失：10.422441  [740800/235474375]
损失：8.335846  [742400/235474375]
损失：     NaN  [744000/235474375]
损失：     NaN  [745600/235474375]
损失：     NaN  [747200/235474375]
损失：     NaN  [748800/235474375]
损失：     NaN  [750400/235474375]

我想知道是否有办法在遇到 NaN 时进行“倒带”，或者定义损失函数，使其永远不会遇到 NaN？谢谢！

英文:

In pytorch, I have a loss function of 1/x plus a few other terms. The last layer of my neural net is a sigmoid, so the values will be between 0 and 1.

Some value fed to 1/x must get really small at some point because my loss has become this:

loss: 11.047459  [729600/235474375]
loss: 9.348356  [731200/235474375]
loss: 7.184393  [732800/235474375]
loss: 8.699876  [734400/235474375]
loss: 7.178806  [736000/235474375]
loss: 8.090066  [737600/235474375]
loss: 12.415799  [739200/235474375]
loss: 10.422441  [740800/235474375]
loss: 8.335846  [742400/235474375]
loss:     nan  [744000/235474375]
loss:     nan  [745600/235474375]
loss:     nan  [747200/235474375]
loss:     nan  [748800/235474375]
loss:     nan  [750400/235474375]

I'm wondering if there's any way to "rewind" if nan is hit or define the loss function so that it's never hit? Thanks!

答案1

得分: 3

Clip your loss to fall within a reasonable range to prevent gradient explosion (i.e. continually climbing out of the local neighborhood as in BrockBrown's answer.)

I'd recommend something like this:

epsilon = 1e-01
loss = 1/(x + epsilon) # 限制最大值为10

or:

eta = 5 # 最大损失值
loss = torch.clamp(loss, max=eta)

As a third option, you can simply check for nan values and not backpropagate in these cases. This won't help with stability, it will just remove outlier loss values, but you still may not get good convergence properties depending on the nature of when these values occur.

if (loss.isnan.sum() == 0): # 无 nan 值
    loss.backward()
    optimizer.step()

I'd also note that if your loss is 1/x, and x is bounded between 0 and 1, you can't ever achieve a loss less than 1. Generally your loss should be 0 for a perfect output. Combining this fact with the previous ideas:

loss = 1/(epsilon + x) - (1/(1+epsilon))

当 x = 1 时，损失接近0，当 x = 0 时，损失大约为1/epsilon。您可以调整 epsilon 以使损失保持在所需范围内以确保稳定性。

英文:

Clip your loss to fall within a reasonable range to prevent gradient explosion (i.e. continually climbing out of the local neighborhood as in BrockBrown's answer.)

I'd recommend something like this:

 epsilon = 1e-01
 loss = 1/(x + epsilon) # bounded maximum value of 10

or:

 eta = 5 # maximum loss value
 loss = torch.clamp(loss,max = eta)

 if (loss.isnan.sum() == 0): # no nan values 
      loss.backward()
      optimizer.step()

loss = 1/(epsilon + x) - (1/(1+epsilon))

When x = 1, loss approaches 0 , and when x = 0, loss is approximately 1/epsilon. You can adjust epsilon to attenuate your loss within a desired range for stability.

答案2

得分: 2

你的损失值一直在忽高忽低，而不是稳定地减小。你尝试过降低学习率吗？看起来它在最小值附近跳跃，来回反弹。如果学习率太高，就会出现这种情况。

关于你提到的“rewinding”，理想情况下，你不应该需要进行“rewind”，损失值应该稳步减小。你还可以考虑使用学习率调度器。

英文:

Your loss is jumping all over the place instead of steadily decreasing. Have you tried decreasing your learning rate? It looks like it's jumping across the minimum, bouncing back and forth. This can happen if the learning rate is too high.

To answer your question about rewinding, ideally you shouldn't have to rewind, the loss should be steadily decreasing. Also you may want to look into learning rate schedulers.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Loss function giving nan in pytorch

问题

答案1

答案2

如何将一个长度可变的列表插入到 wx.ListCtrl 或 wx.ComboBox 中？

How to add legend for a scatter plot with title and customized labels and position the legend in any way user wants?

在0级列上执行自定义计算。

CVAT REST API用于上传文件。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。