Non differentiable loss function keras

huangapple go评论82阅读模式
英文:

Non differentiable loss function keras

问题

我目前正在尝试使用Keras训练图像分割模型。我希望我的模型能返回一个掩模(只包含0和1的图像),以将其应用于输入图像,以获取感兴趣的部分。当我使用均方误差(MSE)损失训练模型时,它返回的掩模值明显低于1,尽管似乎收敛。因此,我实现了一个自定义损失函数。

  1. def loss(y_true, y_pred):
  2. tresholded_pred = tf.where(y_pred >= 0.5, 1.0, 0.0)
  3. sq_diff = tf.square(y_true - tresholded_pred)
  4. return tf.reduce_mean(sq_diff, axis=-1)

然而,我遇到了以下错误:

  1. ValueError: No gradients provided for any variable

我认为这是因为我的函数不可微分。如何在不出现这种错误的情况下实现我想要的效果?

我还尝试过使用lambda层实现阈值处理,但出现了完全相同的错误。我查阅了许多类似的主题,但目前的解决方案并不令人满意。

英文:

I'm currently trying to train a image segmentation model with keras.
I want my model to return a mask (image with only 0s and 1s) to apply to the input image to get only the interesting part.
When I train my model using mse loss, it returns masks with values significantly lower than 1, even though it seems to converge. So I implemented a custom loss function

  1. def loss(y_true, y_pred):
  2. tresholded_pred = tf.where(y_pred >= 0.5, 1.0, 0.0)
  3. sq_diff = tf.square(y_true - tresholded_pred)
  4. return tf.reduce_mean(sq_diff, axis=-1)

However I've got the following error:

  1. ValueError: No gradients provided for any variable

I assume this is because of the non-differentiability of my function.
How can I achieve what I want without having such errors ?

I've also tried to implement the tresholding with a lambda layer, and it raised the exact same error.
I've been through a lot of similar topics, but the solutions aren't satisfying so far.

答案1

得分: 1

你的问题是tf.where不提供梯度(至少在这种情况下是这样,因为1.00.0没有梯度)。

然而,你对神经网络有一些误解:

  1. 你的输出应该是连续的,确实应该如此。在训练模型时,你希望知道输出离你想要的值有多远,而不仅仅是它是错误的。如果你知道它有多远,你可以慢慢地朝着它迈进,直到你想要的所有值都非常接近1,而你想要的所有值都非常接近零。它们几乎永远不会完全等于零。你可以在这里了解更多信息。
  2. 在训练模型时,你不应该简单地将你的值四舍五入为0或1,而是应该使用类似于对输出应用的Sigmoid激活函数来将它们调整到这些值。这个函数将大多数负数映射到0,将大多数正值映射到1,并在它们之间具有连续的过渡。
  3. 在训练时,你不应该在损失函数中将值四舍五入为0或1,但在预测时,你可以四舍五入模型的输出。这将为你提供纯分割地图,然后你可以根据需要使用它。
英文:

Your problem is that tf.where doesn't provide gradients (well, in this situation anyway, since 1.0 and 0.0 don't have gradients).

However, you're misunderstanding a few things about neural networks:

  1. Your output is (and should be) continuous for exactly this reason. While training your model you want to know how far the output is from where you want it, not just that it is wrong. If you know how far away it is, you can slowly step towards it until all the values you want to be 1 are very close to 1, and all the values you want to be 0 are very close to zero. They'll (almost) never be exactly zero. You can read more about this here.
  2. While you shouldn't simply round your values to 0 or 1 while training your model, you should coax them to those values using something like a sigmoid activation function applied to your output. This function maps most negative numbers to 0 and most positive values to 1 and has a continuous transition between them.
  3. While you shouldn't round your values to 0 or 1 in your loss function while training, you can round the output of the model during prediction. This will give you the pure segmentation map you can then use as needed.

答案2

得分: 0

我假设这是因为我的函数不可微分。如何在没有此类错误的情况下实现我想要的结果?

无法。神经网络大多数情况下是使用基于梯度的方法(例如反向传播)进行训练的。您定义的函数的梯度为0,因此无法使用。就是这样。

话虽如此,我认为您的出发点是错误的。事实上,您实际上是要对事物进行二元分类,并不意味着您的损失函数必须这样做(您的掩码只是一个多标签分类问题,此掩码的每个“像素”都是其自身的二元分类)。特别是,在典型的二元分类中,在学习过程中不会对预测进行二元化,只有在推断过程中才会这样做。

您正在寻找的是标准的SigmoidCrossEntropy。然后在预测期间,您只需将阈值设置为0.5。

英文:

> I assume this is because of the non-differentiability of my function. How can I achieve what I want without having such errors ?

You cannot. Neural networks are (most of the time) trained with gradient based methods (e.g. backpropagation). The function you defined has 0 gradients, and thus can't be used. That's it.

That being said I believe you are starting with the wrong assumption. The fact that you are effectively looking to classify things binarily does not mean your loss has to do this (your mask is nothing but a multi-label classification problem, each "pixel" of this mask is a binary classification of its own). In particular typical binary classification will not binarise predictions during learning, you only do this during inference.

What you are looking for is the standard SigmoidCrossEntropy. Then during prediction you just threshold at 0.5.

huangapple
  • 本文由 发表于 2023年4月7日 03:16:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75953016.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定