What will be the gradient and weight of the particula part of the network if coeffiecient of one of the losses contributed by that network is zero?

huangapple go评论57阅读模式
英文:

What will be the gradient and weight of the particula part of the network if coeffiecient of one of the losses contributed by that network is zero?

问题

(i) L2对应的梯度是否会为零?
(ii) 当coeff2为零时,第二个编码器(Enc2)的权重是否会发生改变?

英文:

Suppose, we have a network consisting of two encoders (Enc1 and Enc2) and one decoder (Dec1) in sequential order.

The first encoder (Enc1) contributes as loss L1, the second encoder (Enc2) contributes as loss L2, and the decoder (Dec1) contributes as loss L3. So, the final loss L will be as follows:

L = coeff1 * L1 + coeff2 * L2 + coeff3 * L3

coeff1, coeff2, and coeff3 are weights for the different losses.

Suppose, we keep coeff2 = 0 for the second encoder (Enc2) then I have two doubts:

(i) Whether gradient will be zero corresponding to L2 or not?
(ii) Whether the weight of the second encoder (Enc2) will change or not as coeff2 is 0?

答案1

得分: 1

  1. 假设coeff2=0

    • 如果L2 对应的梯度为零,那么L2 损失不会对网络产生任何影响,与该损失相关的参数的梯度将等于0。特别地,对于所有编码器 Enc1Enc2Enc3,我们有 dL2=d(θ_Enc)=0
  2. 第二个编码器 Enc2 的权重是否会发生变化?

    • 如果 L1L3 在数学上不依赖于第二个编码器 (Enc2) 的参数,那么这些参数的梯度将保持为零。这是因为 d(L1+L3)/d(θ_Enc2)=0

    • 但如果这些参数涉及计算 L1L3,则梯度不会为零。在这种情况下,我们有 |d(L1)/d(θ_Enc2)|>0|d(L3)/d(θ_Enc2)|>0

英文:

Let's assume that coeff2=0:

  1. >Whether gradient will be zero corresponding to L2 or not?

    Then the L2 loss does not have any effect on the network and the gradient of the parameters with respect to this loss is equal to 0. In particular we have dL2=d(θ_Enc)=0 for all encoders Enc1, Enc2, and Enc3.

  2. > Whether the weight of the second encoder Enc2 will change?

    • If L1 and L3 do not depend (mathematically speaking) on parameters from the second encoder (Enc2), then the gradients of those parameters will remain at zero. That's because d(L1+L3)/d(θ_Enc2)=0.

    • If however, these parameters where involved in computing either L1 or L3 then the gradient won't be null. In this case, we have |d(L1)/d(θ_Enc2)|>0 or |d(L3)/d(θ_Enc2)|>0

huangapple
  • 本文由 发表于 2023年4月4日 15:44:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926733.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定