英文:
"derivative for aten::linear_backward is not implemented" when calling backward() on mps in torch
问题
我正在开发一个生成声音的GAN。我从wavegan-pytorch的GitHub上复制了大部分代码。我使用搭载M2核心的MacBook,所以我想将处理从CPU转移到带有MPS的GPU。但是当我在损失上调用torch.Tensor.backward()时,我遇到一个错误,说linear_backward未实现。我对编程仍然很陌生,是否有一个我忽视的简单错误,或者在GPU上运行代码是不可能的?这是我的代码:
real_signal = next(self.train_loader)
# 需要添加混合信号和标志
noise = sample_noise(batch_size * generator_batch_size_factor)
generated = self.generator(noise)
#############################
# 计算鉴别器损失并更新鉴别器
#############################
self.apply_zero_grad()
disc_cost, disc_wd = self.calculate_discriminator_loss(
real_signal.data, generated.data
)
assert not (torch.isnan(disc_cost))
disc_cost.backward()
self.optimizer_d.step()
非常感谢帮助。如果需要更多信息,请告诉我。如果有一个我不明白的简单解决方案,我提前道歉,因为我对这方面还很新。calculate_discriminator_loss()函数的代码如下:
def calculate_discriminator_loss(self, real, generated):
disc_out_gen = self.discriminator(generated)
disc_out_real = self.discriminator(real)
alpha = torch.FloatTensor(batch_size * 2, 1, 1).uniform_(0, 1).to(device)
alpha = alpha.expand(batch_size * 2, real.size(1), real.size(2))
interpolated = (1 - alpha) * real.data + (alpha) * generated.data[:batch_size * 2]
interpolated = Variable(interpolated, requires_grad=True)
# 计算插值样本的概率
prob_interpolated = self.discriminator(interpolated)
grad_inputs = interpolated
ones = torch.ones(prob_interpolated.size()).to(device)
gradients = grad(
outputs=prob_interpolated,
inputs=grad_inputs,
grad_outputs=ones,
create_graph=True,
retain_graph=True,
only_inputs=True,
)[0]
# 计算梯度惩罚
grad_penalty = (
p_coeff
* ((gradients.view(gradients.size(0), -1).norm(2, dim=1) - 1) ** 2).mean()
)
assert not (torch.isnan(grad_penalty))
assert not (torch.isnan(disc_out_gen.mean()))
assert not (torch.isnan(disc_out_real.mean()))
cost_wd = disc_out_gen.mean() - disc_out_real.mean()
cost = cost_wd + grad_penalty
return cost, cost_wd
英文:
I'm working on a GAN to generate sounds. I copied most of the code from the wavegan-pytorch github. I'm working on a MacBook with M2 core, so I wanted to shift the processing from cpu to gpu with mps. But when I call torch.Tensor.backward() on my loss I get an Error, that linear_backward is not implemented. I'm still pretty new to programming, is there a simple mistake, that I overlook, or is it just not possible to run the code on gpu? Here's my code:
real_signal = next(self.train_loader)
# need to add mixed signal and flag
noise = sample_noise(batch_size * generator_batch_size_factor)
generated = self.generator(noise)
#############################
# Calculating discriminator loss and updating discriminator
#############################
self.apply_zero_grad()
disc_cost, disc_wd = self.calculate_discriminator_loss(
real_signal.data, generated.data
)
assert not (torch.isnan(disc_cost))
disc_cost.backward()
self.optimizer_d.step()
would be very glad for help. Let me know, if you need more info, I'm sorry in advance, if there's a simple solution, that I don't get, because I'm new to this.
Here is the code for the calculate_discriminator_loss() function:
def calculate_discriminator_loss(self, real, generated):
disc_out_gen = self.discriminator(generated)
disc_out_real = self.discriminator(real)
alpha = torch.FloatTensor(batch_size * 2, 1, 1).uniform_(0, 1).to(device)
alpha = alpha.expand(batch_size * 2, real.size(1), real.size(2))
interpolated = (1 - alpha) * real.data + (alpha) * generated.data[:batch_size * 2]
interpolated = Variable(interpolated, requires_grad=True)
# calculate probability of interpolated examples
prob_interpolated = self.discriminator(interpolated)
grad_inputs = interpolated
ones = torch.ones(prob_interpolated.size()).to(device)
gradients = grad(
outputs=prob_interpolated,
inputs=grad_inputs,
grad_outputs=ones,
create_graph=True,
retain_graph=True,
only_inputs=True,
)[0]
# calculate gradient penalty
grad_penalty = (
p_coeff
* ((gradients.view(gradients.size(0), -1).norm(2, dim=1) - 1) ** 2).mean()
)
assert not (torch.isnan(grad_penalty))
assert not (torch.isnan(disc_out_gen.mean()))
assert not (torch.isnan(disc_out_real.mean()))
cost_wd = disc_out_gen.mean() - disc_out_real.mean()
cost = cost_wd + grad_penalty
return cost, cost_wd
答案1
得分: 1
看到你正在实现WGAN-GP中的鉴别器损失计算,我想看看哪里出了问题,并改进你的代码。
首先,你做得非常好,只是有些地方略有瑕疵。问题确实出在calculate_discriminator_loss
函数中。改进的地方有:
Variable
在最新版本的PyTorch中已被弃用。建议不要使用它,因为它不再受支持。- 你可以在不访问
data
属性的情况下索引generated
和real
张量,像这样:generated[:batch_size * 2]
。 - 我不确定你在
batch_size * 2
处想要做什么。生成的批次是否比真实的数据批次大?我建议保持它们相同的大小。 - PyTorch有一个在这里非常有用的
expand_as
函数(而不是expand
,然后定义某个张量的大小)。 - 在计算梯度时,不需要
retain_graph=True
,因为你不会两次计算梯度。 - 在计算梯度时,不需要
only_inputs=True
。它已被弃用,而默认设置是True。 p_coeff
和device
不是函数中定义的变量。确保在类中定义它们,然后通过self.p_coeff
和self.device
访问它们。
在我运行时,以下内容有效:
def calculate_discriminator_loss(self, real, generated):
disc_out_gen = self.discriminator(generated)
disc_out_real = self.discriminator(real)
alpha = torch.rand(self.batch_size, 1).to(self.device)
alpha = alpha.expand_as(real)
interpolated = (1 - alpha) * real + alpha * generated
prob_interpolated = self.discriminator(interpolated)
ones = torch.ones(prob_interpolated.size()).to(self.device)
gradients = grad(
outputs=prob_interpolated,
inputs=interpolated,
grad_outputs=ones,
create_graph=True)[0]
grad_penalty = (
torch.mean((gradients.view(gradients.size(0), -1).norm(2, dim=1) - 1) ** 2)
)
cost_wd = disc_out_gen.mean() - disc_out_real.mean()
cost = cost_wd + grad_penalty
return cost, cost_wd
还清理了一下你的代码,使其更易读,并删除了断言。
希望对你有帮助。
英文:
Seeing you are implementing the discriminator loss calculation from a WGAN-GP, I thought I'd work out what was going wrong and improve what you have.
First, you were doing absolutely great, with some slight flaws here and there. The problem is indeed in the calculate_discriminator_loss function. Things to improve:
Variable
is deprecated in the most recent version of Pytorch. I'd recommend not to use it because it is unsupported.- You can index the
generated
andreal
tensors without accessing thedata
attribute, like so: `generated[:batch_size * 2] - I am not sure what you are trying to do with the
batch_size * 2
. Is the generated batch bigger than the real batch of data? I would advise to keep them the same size. - PyTorch has an
expand_as
function which is really useful here (instead ofexpand
and then to define the size of some tensor). - When computing the gradients, you do not need
retain_graph=True
as you don't compute gradients twice. - When computing gradients, you do not need
only_inputs=True
. It is deprecated, and the default setting is True. p_coeff
anddevice
are not variables defined in the function. Make sure to define them in the class, and then access them throughself.p_coeff
andself.device
.
The following works when I run it:
def calculate_discriminator_loss(self, real, generated):
assert real.shape == generated.shape
disc_out_gen = self.discriminator(generated)
disc_out_real = self.discriminator(real)
alpha = torch.rand(self.batch_size, 1).to(self.device)
alpha = alpha.expand_as(real)
interpolated = (1 - alpha) * real + alpha * generated
# calculate probability of interpolated examples
prob_interpolated = self.discriminator(interpolated)
ones = torch.ones(prob_interpolated.size()).to(self.device)
gradients = grad(
outputs=prob_interpolated,
inputs=interpolated,
grad_outputs=ones,
create_graph=True)[0]
# calculate gradient penalty
grad_penalty = (
torch.mean((gradients.view(gradients.size(0), -1).norm(2, dim=1) - 1) ** 2)
)
cost_wd = disc_out_gen.mean() - disc_out_real.mean()
cost = cost_wd + grad_penalty
return cost, cost_wd
Cleaned up your code a bit as well to be more readable and removed the asserts.
Hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论