英文:
Pytorch nn.CrossEntropyLoss() only returns -0.0
问题
以下是翻译好的部分:
运行以下代码片段
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
返回
tensor(-0.)
这是怎么回事?我是否遗漏了这个问题的一些基本内容?
我有一个非常简单的前馈神经网络模型:
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
class FeedForwardNeuralNet(nn.Module):
def __init__(self, S1, S2, S3):
super(FeedForwardNeuralNet, self).__init__()
self.linear1 = nn.Linear(S1, S2)
self.linear2 = nn.Linear(S2, S3)
def forward(self, x):
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x))
return x
def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
model = FeedForwardNeuralNet(S1, S2, S3)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)
# 使用PyTorch数据加载器
for input, target in data:
optimizer.zero_grad() # 清除梯度
output = model(input) # 前向传播
loss = loss_fn(output, target) # 计算损失
print(loss.item())
loss.backward() # 计算梯度
optimizer.step() # 更新权重
return model
损失仍然为零。
我的数据的形状如下:
X: [float, float]
Y: {0或1}
X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)
y_train = torch.from_numpy(np.array(list(map(lambda a: [a], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: [a], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: [a], y_test)))).type(torch.FloatTensor)
train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)
batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # 使用整个验证数据集的一个批次
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # 使用整个测试数据集的一个批次
我使用以下参数运行训练
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())
英文:
Running the following code snippet
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
returns
tensor(-0.)
How can this be? Am I missing something fundamental about this problem?
I have a super simple Feed-Forward NN model:
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
class FeedForwardNeuralNet(nn.Module):
def __init__(self, S1, S2, S3):
super(FeedForwardNeuralNet, self).__init__()
self.linear1 = nn.Linear(S1, S2)
self.linear2 = nn.Linear(S2, S3)
def forward(self, x):
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x))
return x
def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
model = FeedForwardNeuralNet(S1, S2, S3)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)
#py torch data loader
for input, target in data:
optimizer.zero_grad() # clear gradients
output = model(input) # forward pass
loss = loss_fn(output, target) # calculate loss
print(loss.item())
loss.backward() # calculate gradients
optimizer.step() # update weights
return model
And the loss remains at zero.
My data is in the shape:
X: [float, float]
Y: {0 or 1}
from sklearn.model_selection import train_test_split
X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)
y_train = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_test)))).type(torch.FloatTensor)
train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)
batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # use the entire val dataset in one batch
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # use the entire test dataset in one batch
I run train with the following parameters
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())
答案1
得分: 1
我没有查看你的代码,所以我只回答你为什么 torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 返回 tensor(-0.) 的问题。
根据 torch.nn.CrossEntropyLoss 的文档(注意 C = 类的数量, N = 实例的数量):
注:
target的解释方式取决于它相对于input的形状。如果target.shape == input.shape,那么target被解释为包含类别的 概率。
我认为在你的情况中是这样的:torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 之所以为0,是因为 CrossEntropyLoss 函数将 target 解释为“类别0的概率应为1”。然后,由于 input 被解释为包含 logits,所以很容易理解为什么输出是0:你告诉损失函数要进行“一元分类”,对于损失函数来说,input 的任何值都会导致损失为零。
也许你实际上想要提供给损失函数的是类别 标签。
(顺便说一句,要注意使用 torch.Tensor,因为它实际上是指 torch.FloatTensor,这可能不是你想要的。)
英文:
I have not looked at your code, so I am only responding to your question of why torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) returns tensor(-0.).
From the documentation for torch.nn.CrossEntropyLoss (note that C = number of classes, N = number of instances):
Note that target can be interpreted differently depending on its shape relative to input. If target.shape == input.shape, then target is interpreted as having class probabilities.
I think this is what is happening in your case: torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) is 0 because the CrossEntropyLoss function is taking target to mean "The probability of class 0 should be 1". Then, since input is interpreted as containing logits, it's easy to see why the output is 0: you are telling the loss function that you want to do "unary classification", and any value for input will result in a zero cost for the loss function.
Probably what you want to do instead is to hand the loss function class labels.
(BTW, beware using torch.Tensor, as it actually means torch.FloatTensor, which you might not have wanted.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论