Pytorch nn.CrossEntropyLoss() 仅返回 -0.0

huangapple go评论54阅读模式
英文:

Pytorch nn.CrossEntropyLoss() only returns -0.0

问题

以下是翻译好的部分:

运行以下代码片段
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
返回
tensor(-0.)
这是怎么回事?我是否遗漏了这个问题的一些基本内容?

我有一个非常简单的前馈神经网络模型:

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class FeedForwardNeuralNet(nn.Module):
    def __init__(self, S1, S2, S3):
        super(FeedForwardNeuralNet, self).__init__()
        self.linear1 = nn.Linear(S1, S2)
        self.linear2 = nn.Linear(S2, S3)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        return x

def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
    model = FeedForwardNeuralNet(S1, S2, S3)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)

    # 使用PyTorch数据加载器
    for input, target in data:
        optimizer.zero_grad() # 清除梯度
        output = model(input) # 前向传播
        loss = loss_fn(output, target) # 计算损失
        print(loss.item())
        loss.backward() # 计算梯度
        optimizer.step() # 更新权重
    return model

损失仍然为零。

我的数据的形状如下:

X: [float, float]
Y: {0或1}


X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)

X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)

y_train = torch.from_numpy(np.array(list(map(lambda a: [a], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: [a], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: [a], y_test)))).type(torch.FloatTensor)

train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)

batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # 使用整个验证数据集的一个批次
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # 使用整个测试数据集的一个批次

我使用以下参数运行训练
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())

英文:

Running the following code snippet
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
returns
tensor(-0.)
How can this be? Am I missing something fundamental about this problem?

I have a super simple Feed-Forward NN model:

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class FeedForwardNeuralNet(nn.Module):
    def __init__(self, S1, S2, S3):
        super(FeedForwardNeuralNet, self).__init__()
        self.linear1 = nn.Linear(S1, S2)
        self.linear2 = nn.Linear(S2, S3)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        return x

def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
    model = FeedForwardNeuralNet(S1, S2, S3)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)

    #py torch data loader
    for input, target in data:
        optimizer.zero_grad() # clear gradients
        output = model(input) # forward pass
        loss = loss_fn(output, target) # calculate loss
        print(loss.item())
        loss.backward() # calculate gradients
        optimizer.step() # update weights
    return model

And the loss remains at zero.

My data is in the shape:

X: [float, float]
Y: {0 or 1}

from sklearn.model_selection import train_test_split

X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)

X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)

y_train = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_test)))).type(torch.FloatTensor)

train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)

batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # use the entire val dataset in one batch
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # use the entire test dataset in one batch

I run train with the following parameters
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())

答案1

得分: 1

我没有查看你的代码,所以我只回答你为什么 torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 返回 tensor(-0.) 的问题。

根据 torch.nn.CrossEntropyLoss 的文档(注意 C = 类的数量, N = 实例的数量):

注:target 的解释方式取决于它相对于 input 的形状。如果 target.shape == input.shape,那么 target 被解释为包含类别的 概率

我认为在你的情况中是这样的:torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 之所以为0,是因为 CrossEntropyLoss 函数将 target 解释为“类别0的概率应为1”。然后,由于 input 被解释为包含 logits,所以很容易理解为什么输出是0:你告诉损失函数要进行“一元分类”,对于损失函数来说,input 的任何值都会导致损失为零。

也许你实际上想要提供给损失函数的是类别 标签

(顺便说一句,要注意使用 torch.Tensor,因为它实际上是指 torch.FloatTensor,这可能不是你想要的。)

英文:

I have not looked at your code, so I am only responding to your question of why torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) returns tensor(-0.).

From the documentation for torch.nn.CrossEntropyLoss (note that C = number of classes, N = number of instances):

> Pytorch nn.CrossEntropyLoss() 仅返回 -0.0

Note that target can be interpreted differently depending on its shape relative to input. If target.shape == input.shape, then target is interpreted as having class probabilities.

I think this is what is happening in your case: torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) is 0 because the CrossEntropyLoss function is taking target to mean "The probability of class 0 should be 1". Then, since input is interpreted as containing logits, it's easy to see why the output is 0: you are telling the loss function that you want to do "unary classification", and any value for input will result in a zero cost for the loss function.

Probably what you want to do instead is to hand the loss function class labels.

(BTW, beware using torch.Tensor, as it actually means torch.FloatTensor, which you might not have wanted.)

huangapple
  • 本文由 发表于 2023年4月11日 02:49:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979836.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定