英文:
Initializing two neural networks from the same class: first initialization influences the second
问题
我是PyTorch的初学者,尝试使用不同隐藏层大小初始化两个来自同一类别的网络来实现学生-教师架构。似乎第一个网络的初始化影响了第二个网络,具体来说,当首先初始化教师网络时,初始化学生网络时会得到不同的损失,尽管我是独立地训练学生网络的。
我的神经网络类使用一个线性层,后跟一个BatchNorm1d层,并且我使用nn.init.uniform_
来初始化BatchNorm的权重。所以我猜想这是导致第一次初始化影响第二次的原因,要么是BatchNorm层,要么是线性层保留了第一次初始化的一些运行统计数据。
我尝试使用reset_running_stats()
来重置BatchNorm的运行统计数据,但没有改变任何东西。有关如何解决这个问题的任何想法吗?谢谢。
英文:
I'm a begginer with PyTorch and I'm attempting to implement student-teacher architecture by initializing two networks with different hidden sizes from the same class. It seems the first network initialization influences the second one, more specifically I get different losses on the student network when initializing the teacher network first, even though I’m training the student network independently of the teacher.
My NN class uses a Linear layer followed by a BatchNorm1d layer and I'm initializing the BatchNorm weights using nn.init.uniform_
. So I’m guessing this is what causes the first initialization to influence the second, either the BatchNorm layer or Linear layer is keeping some running statistics from the first initialization.
I've tried resetting the running stats on the BatchNorm using reset_running_stats()
but that didn't change anything. Any ideas on how to solve this? Thanks.
答案1
得分: 0
保证在使用神经网络时获得可重现的结果是相当困难的,因为涉及到大量的随机性。然而,限制随机性的一种方式是设置种子。
这可以在PyTorch中完成:
import torch
torch.manual_seed(seed) # 种子是您选择的任意数字
您可能会发现,根据初始化的顺序,结果不同,因为两个网络都以某种方式使用相同的随机数生成器。
在处理多个网络时,尝试在实例化模型之前设置种子,以使它们都从随机数生成器接收相同的数字。类似这样:
torch.manual_seed(seed)
student = StudentNetwork()
torch.manual_seed(seed) # 与前面的调用使用相同的种子
teacher = TeacherNetwork()
英文:
Guaranteeing reproducible results when using neural networks is quite hard by the sheer amount of randomness involved. However, one way to limit the sources of randomness is by setting seeds.
This can be done in pytorch with:
import torch
torch.manual_seed(seed) # seed is any number of your choice
You were probably getting different results depending on the order of initialization because both networks were somehow using the same random number generator.
When dealing with multiple networks, try to set the seed right before instantiating the models to make them both receive the same numbers from the RNG. Something like:
torch.manual_seed(seed)
student = StudentNetwork()
torch.manual_seed(seed) # same seed as previous call
teacher = TeacherNetwork()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论