英文:
What will be the gradient and weight of the particula part of the network if coeffiecient of one of the losses contributed by that network is zero?
问题
(i) L2对应的梯度是否会为零?
(ii) 当coeff2为零时,第二个编码器(Enc2)的权重是否会发生改变?
英文:
Suppose, we have a network consisting of two encoders (Enc1 and Enc2) and one decoder (Dec1) in sequential order.
The first encoder (Enc1) contributes as loss L1, the second encoder (Enc2) contributes as loss L2, and the decoder (Dec1) contributes as loss L3. So, the final loss L will be as follows:
L = coeff1 * L1 + coeff2 * L2 + coeff3 * L3
coeff1, coeff2, and coeff3 are weights for the different losses.
Suppose, we keep coeff2 = 0 for the second encoder (Enc2) then I have two doubts:
(i) Whether gradient will be zero corresponding to L2 or not?
(ii) Whether the weight of the second encoder (Enc2) will change or not as coeff2 is 0?
答案1
得分: 1
-
假设
coeff2=0
:- 如果
L2
对应的梯度为零,那么L2
损失不会对网络产生任何影响,与该损失相关的参数的梯度将等于0
。特别地,对于所有编码器Enc1
、Enc2
和Enc3
,我们有dL2=d(θ_Enc)=0
。
- 如果
-
第二个编码器
Enc2
的权重是否会发生变化?-
如果
L1
和L3
在数学上不依赖于第二个编码器 (Enc2
) 的参数,那么这些参数的梯度将保持为零。这是因为d(L1+L3)/d(θ_Enc2)=0
。 -
但如果这些参数涉及计算
L1
或L3
,则梯度不会为零。在这种情况下,我们有|d(L1)/d(θ_Enc2)|>0
或|d(L3)/d(θ_Enc2)|>0
。
-
英文:
Let's assume that coeff2=0
:
-
>Whether gradient will be zero corresponding to
L2
or not?Then the
L2
loss does not have any effect on the network and the gradient of the parameters with respect to this loss is equal to0
. In particular we havedL2=d(θ_Enc)=0
for all encodersEnc1
,Enc2
, andEnc3
. -
> Whether the weight of the second encoder
Enc2
will change?-
If
L1
andL3
do not depend (mathematically speaking) on parameters from the second encoder (Enc2
), then the gradients of those parameters will remain at zero. That's becaused(L1+L3)/d(θ_Enc2)=0
. -
If however, these parameters where involved in computing either
L1
orL3
then the gradient won't be null. In this case, we have|d(L1)/d(θ_Enc2)|>0
or|d(L3)/d(θ_Enc2)|>0
-
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论