英文:
Loss function giving nan in pytorch
问题
在PyTorch中,我有一个损失函数,其中包括 1/x
以及其他一些项。我的神经网络的最后一层是一个Sigmoid层,因此值将介于0和1之间。
由于我的损失函数变成了这样,必须让传递给 1/x
的某些值在某个点变得非常小:
损失:11.047459 [729600/235474375]
损失:9.348356 [731200/235474375]
损失:7.184393 [732800/235474375]
损失:8.699876 [734400/235474375]
损失:7.178806 [736000/235474375]
损失:8.090066 [737600/235474375]
损失:12.415799 [739200/235474375]
损失:10.422441 [740800/235474375]
损失:8.335846 [742400/235474375]
损失: NaN [744000/235474375]
损失: NaN [745600/235474375]
损失: NaN [747200/235474375]
损失: NaN [748800/235474375]
损失: NaN [750400/235474375]
我想知道是否有办法在遇到 NaN
时进行“倒带”,或者定义损失函数,使其永远不会遇到 NaN
?谢谢!
英文:
In pytorch, I have a loss function of 1/x
plus a few other terms. The last layer of my neural net is a sigmoid, so the values will be between 0 and 1.
Some value fed to 1/x
must get really small at some point because my loss has become this:
loss: 11.047459 [729600/235474375]
loss: 9.348356 [731200/235474375]
loss: 7.184393 [732800/235474375]
loss: 8.699876 [734400/235474375]
loss: 7.178806 [736000/235474375]
loss: 8.090066 [737600/235474375]
loss: 12.415799 [739200/235474375]
loss: 10.422441 [740800/235474375]
loss: 8.335846 [742400/235474375]
loss: nan [744000/235474375]
loss: nan [745600/235474375]
loss: nan [747200/235474375]
loss: nan [748800/235474375]
loss: nan [750400/235474375]
I'm wondering if there's any way to "rewind" if nan
is hit or define the loss function so that it's never hit? Thanks!
答案1
得分: 3
Clip your loss to fall within a reasonable range to prevent gradient explosion (i.e. continually climbing out of the local neighborhood as in BrockBrown's answer.)
I'd recommend something like this:
epsilon = 1e-01
loss = 1/(x + epsilon) # 限制最大值为10
or:
eta = 5 # 最大损失值
loss = torch.clamp(loss, max=eta)
As a third option, you can simply check for nan
values and not backpropagate in these cases. This won't help with stability, it will just remove outlier loss values, but you still may not get good convergence properties depending on the nature of when these values occur.
if (loss.isnan.sum() == 0): # 无 nan 值
loss.backward()
optimizer.step()
I'd also note that if your loss is 1/x
, and x is bounded between 0 and 1, you can't ever achieve a loss less than 1. Generally your loss should be 0 for a perfect output. Combining this fact with the previous ideas:
loss = 1/(epsilon + x) - (1/(1+epsilon))
当 x = 1 时,损失接近0,当 x = 0 时,损失大约为1/epsilon。您可以调整 epsilon 以使损失保持在所需范围内以确保稳定性。
英文:
Clip your loss to fall within a reasonable range to prevent gradient explosion (i.e. continually climbing out of the local neighborhood as in BrockBrown's answer.)
I'd recommend something like this:
epsilon = 1e-01
loss = 1/(x + epsilon) # bounded maximum value of 10
or:
eta = 5 # maximum loss value
loss = torch.clamp(loss,max = eta)
As a third option, you can simply check for nan
values and not backpropagate in these cases. This won't help with stability, it will just remove outlier loss values, but you still may not get good convergence properties depending on the nature of when these values occur.
if (loss.isnan.sum() == 0): # no nan values
loss.backward()
optimizer.step()
I'd also note that if your loss is 1/x
, and x is bounded between 0 and 1, you can't ever achieve a loss less than 1. Generally your loss should be 0 for a perfect output. Combining this fact with the previous ideas:
loss = 1/(epsilon + x) - (1/(1+epsilon))
When x = 1, loss approaches 0 , and when x = 0, loss is approximately 1/epsilon. You can adjust epsilon to attenuate your loss within a desired range for stability.
答案2
得分: 2
你的损失值一直在忽高忽低,而不是稳定地减小。你尝试过降低学习率吗?看起来它在最小值附近跳跃,来回反弹。如果学习率太高,就会出现这种情况。
关于你提到的“rewinding”,理想情况下,你不应该需要进行“rewind”,损失值应该稳步减小。你还可以考虑使用学习率调度器。
英文:
Your loss is jumping all over the place instead of steadily decreasing. Have you tried decreasing your learning rate? It looks like it's jumping across the minimum, bouncing back and forth. This can happen if the learning rate is too high.
To answer your question about rewinding, ideally you shouldn't have to rewind, the loss should be steadily decreasing. Also you may want to look into learning rate schedulers.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论