英文:
Pytorch nn.CrossEntropyLoss() only returns -0.0
问题
以下是翻译好的部分:
运行以下代码片段
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
返回
tensor(-0.)
这是怎么回事?我是否遗漏了这个问题的一些基本内容?
我有一个非常简单的前馈神经网络模型:
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
class FeedForwardNeuralNet(nn.Module):
def __init__(self, S1, S2, S3):
super(FeedForwardNeuralNet, self).__init__()
self.linear1 = nn.Linear(S1, S2)
self.linear2 = nn.Linear(S2, S3)
def forward(self, x):
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x))
return x
def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
model = FeedForwardNeuralNet(S1, S2, S3)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)
# 使用PyTorch数据加载器
for input, target in data:
optimizer.zero_grad() # 清除梯度
output = model(input) # 前向传播
loss = loss_fn(output, target) # 计算损失
print(loss.item())
loss.backward() # 计算梯度
optimizer.step() # 更新权重
return model
损失仍然为零。
我的数据的形状如下:
X: [float, float]
Y: {0或1}
X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)
y_train = torch.from_numpy(np.array(list(map(lambda a: [a], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: [a], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: [a], y_test)))).type(torch.FloatTensor)
train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)
batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # 使用整个验证数据集的一个批次
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # 使用整个测试数据集的一个批次
我使用以下参数运行训练
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())
英文:
Running the following code snippet
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
returns
tensor(-0.)
How can this be? Am I missing something fundamental about this problem?
I have a super simple Feed-Forward NN model:
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
class FeedForwardNeuralNet(nn.Module):
def __init__(self, S1, S2, S3):
super(FeedForwardNeuralNet, self).__init__()
self.linear1 = nn.Linear(S1, S2)
self.linear2 = nn.Linear(S2, S3)
def forward(self, x):
x = F.relu(self.linear1(x))
x = F.relu(self.linear2(x))
return x
def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
model = FeedForwardNeuralNet(S1, S2, S3)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)
#py torch data loader
for input, target in data:
optimizer.zero_grad() # clear gradients
output = model(input) # forward pass
loss = loss_fn(output, target) # calculate loss
print(loss.item())
loss.backward() # calculate gradients
optimizer.step() # update weights
return model
And the loss remains at zero.
My data is in the shape:
X: [float, float]
Y: {0 or 1}
from sklearn.model_selection import train_test_split
X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)
X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)
y_train = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_test)))).type(torch.FloatTensor)
train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)
batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # use the entire val dataset in one batch
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # use the entire test dataset in one batch
I run train with the following parameters
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())
答案1
得分: 1
我没有查看你的代码,所以我只回答你为什么 torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
返回 tensor(-0.)
的问题。
根据 torch.nn.CrossEntropyLoss 的文档(注意 C = 类的数量, N = 实例的数量):
注:
target
的解释方式取决于它相对于input
的形状。如果target.shape == input.shape
,那么target
被解释为包含类别的 概率。
我认为在你的情况中是这样的:torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
之所以为0,是因为 CrossEntropyLoss
函数将 target
解释为“类别0的概率应为1”。然后,由于 input
被解释为包含 logits,所以很容易理解为什么输出是0:你告诉损失函数要进行“一元分类”,对于损失函数来说,input
的任何值都会导致损失为零。
也许你实际上想要提供给损失函数的是类别 标签。
(顺便说一句,要注意使用 torch.Tensor
,因为它实际上是指 torch.FloatTensor
,这可能不是你想要的。)
英文:
I have not looked at your code, so I am only responding to your question of why torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
returns tensor(-0.)
.
From the documentation for torch.nn.CrossEntropyLoss (note that C = number of classes, N = number of instances):
Note that target
can be interpreted differently depending on its shape relative to input
. If target.shape == input.shape
, then target
is interpreted as having class probabilities.
I think this is what is happening in your case: torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
is 0 because the CrossEntropyLoss
function is taking target
to mean "The probability of class 0 should be 1". Then, since input
is interpreted as containing logits, it's easy to see why the output is 0: you are telling the loss function that you want to do "unary classification", and any value for input
will result in a zero cost for the loss function.
Probably what you want to do instead is to hand the loss function class labels.
(BTW, beware using torch.Tensor
, as it actually means torch.FloatTensor
, which you might not have wanted.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论