2023年4月11日 02:49:40go评论130阅读模式

英文:

Pytorch nn.CrossEntropyLoss() only returns -0.0

问题

以下是翻译好的部分：

运行以下代码片段
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
返回
tensor(-0.)
这是怎么回事？我是否遗漏了这个问题的一些基本内容？

我有一个非常简单的前馈神经网络模型：

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class FeedForwardNeuralNet(nn.Module):
    def __init__(self, S1, S2, S3):
        super(FeedForwardNeuralNet, self).__init__()
        self.linear1 = nn.Linear(S1, S2)
        self.linear2 = nn.Linear(S2, S3)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        return x

def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
    model = FeedForwardNeuralNet(S1, S2, S3)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)

    # 使用PyTorch数据加载器
    for input, target in data:
        optimizer.zero_grad() # 清除梯度
        output = model(input) # 前向传播
        loss = loss_fn(output, target) # 计算损失
        print(loss.item())
        loss.backward() # 计算梯度
        optimizer.step() # 更新权重
    return model

损失仍然为零。

我的数据的形状如下：

X: [float, float]
Y: {0或1}


X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)

X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)

y_train = torch.from_numpy(np.array(list(map(lambda a: [a], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: [a], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: [a], y_test)))).type(torch.FloatTensor)

train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)

batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # 使用整个验证数据集的一个批次
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # 使用整个测试数据集的一个批次

我使用以下参数运行训练
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())

英文:

Running the following code snippet
torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1]))
returns
tensor(-0.)
How can this be? Am I missing something fundamental about this problem?

I have a super simple Feed-Forward NN model:

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class FeedForwardNeuralNet(nn.Module):
    def __init__(self, S1, S2, S3):
        super(FeedForwardNeuralNet, self).__init__()
        self.linear1 = nn.Linear(S1, S2)
        self.linear2 = nn.Linear(S2, S3)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        return x

def train(data, S1, S2, S3, weight_decay, loss_fn, learning_rate=0.01):
    model = FeedForwardNeuralNet(S1, S2, S3)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)

    #py torch data loader
    for input, target in data:
        optimizer.zero_grad() # clear gradients
        output = model(input) # forward pass
        loss = loss_fn(output, target) # calculate loss
        print(loss.item())
        loss.backward() # calculate gradients
        optimizer.step() # update weights
    return model

And the loss remains at zero.

My data is in the shape:

X: [float, float]
Y: {0 or 1}

from sklearn.model_selection import train_test_split

X_train, X_rem, y_train, y_rem = train_test_split(X, y, train_size=0.8)
X_val, X_test, y_val, y_test = train_test_split(X_rem, y_rem, test_size=0.5)

X_train = torch.from_numpy(X_train).type(torch.FloatTensor)
X_val = torch.from_numpy(X_val).type(torch.FloatTensor)
X_test = torch.from_numpy(X_test).type(torch.FloatTensor)

y_train = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_train)))).type(torch.FloatTensor)
y_val = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_val)))).type(torch.FloatTensor)
y_test = torch.from_numpy(np.array(list(map(lambda a: \[a\], y_test)))).type(torch.FloatTensor)

train_data = torch.utils.data.TensorDataset(X_train, y_train)
validation_data = torch.utils.data.TensorDataset(X_val, y_val)
test_data = torch.utils.data.TensorDataset(X_test, y_test)

batch_size=1000
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=False)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=len(validation_data), shuffle=False) # use the entire val dataset in one batch
test_loader = torch.utils.data.DataLoader(test_data, batch_size=len(test_data), shuffle=False) # use the entire test dataset in one batch

I run train with the following parameters
train(data=train_loader, S1=2, S2=S2, S3=1, weight_decay=0.1, loss_fn=nn.CrossEntropyLoss())

答案1

得分: 1

我没有查看你的代码，所以我只回答你为什么 torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 返回 tensor(-0.) 的问题。

根据 torch.nn.CrossEntropyLoss 的文档（注意 C = 类的数量， N = 实例的数量）：

注：target 的解释方式取决于它相对于 input 的形状。如果 target.shape == input.shape，那么 target 被解释为包含类别的概率。

我认为在你的情况中是这样的：torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) 之所以为0，是因为 CrossEntropyLoss 函数将 target 解释为“类别0的概率应为1”。然后，由于 input 被解释为包含 logits，所以很容易理解为什么输出是0：你告诉损失函数要进行“一元分类”，对于损失函数来说，input 的任何值都会导致损失为零。

也许你实际上想要提供给损失函数的是类别标签。

（顺便说一句，要注意使用 torch.Tensor，因为它实际上是指 torch.FloatTensor，这可能不是你想要的。）

英文:

I have not looked at your code, so I am only responding to your question of why torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) returns tensor(-0.).

From the documentation for torch.nn.CrossEntropyLoss (note that C = number of classes, N = number of instances):

Note that target can be interpreted differently depending on its shape relative to input. If target.shape == input.shape, then target is interpreted as having class probabilities.

I think this is what is happening in your case: torch.nn.CrossEntropyLoss()(torch.Tensor([0]), torch.Tensor([1])) is 0 because the CrossEntropyLoss function is taking target to mean "The probability of class 0 should be 1". Then, since input is interpreted as containing logits, it's easy to see why the output is 0: you are telling the loss function that you want to do "unary classification", and any value for input will result in a zero cost for the loss function.

Probably what you want to do instead is to hand the loss function class labels.

(BTW, beware using torch.Tensor, as it actually means torch.FloatTensor, which you might not have wanted.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pytorch nn.CrossEntropyLoss() 仅返回 -0.0

问题

答案1

改进深度学习模型以检测不同条件下的火车车厢间隙。

实时目标检测使用Yolo模型不起作用。

将图像分成规则的块。

训练BARTForSequenceClassification返回的数据具有不一致的维度。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论