英文:
torch crossentropy loss calculation difference between 2D input and 3D input
问题
我在torch.nn.CrossEntropyLoss上进行测试。我正在使用官方页面上显示的示例。
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=False)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
输出是2.05。
在示例中,输入和目标都是2D张量。由于在大多数自然语言处理情况下,输入应该是3D张量,相应地输出也应该是3D张量。因此,我编写了一些测试代码,并发现了一个奇怪的问题。
input = torch.stack([input])
target = torch.stack([target])
output = loss(input, target)
输出是0.9492。
这个结果真的让我困惑,除了维度外,张量内部的数字完全相同。有人知道差异的原因吗?
我测试这个方法的原因是我正在使用Transformers.BartForConditionalGeneration进行项目开发。损失结果显示在输出中,始终是(1,)形状。这个输出令人困惑。如果我的批量大小大于1,我应该得到批量大小数量的损失,而不仅仅是一个。我查看了代码,它只是简单地使用了nn.CrossEntropyLoss(),所以我认为问题可能出在nn.CrossEntropyLoss()方法中。但是,我陷入了这个方法中。
英文:
i am running a test on torch.nn.CrossEntropyLoss. I am using the example shown on the official page.
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=False)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)
the output is 2.05.
in the example, both the input and the target are 2D tensors. Since in most NLP case, the input should be 3D tensor and correspondingly the output should be 3D tensor as well. Therefore, i wrote the a couple lines of testing code, and found a weird issue.
input = torch.stack([input])
target = torch.stack([target])
output = loss(ins, ts)
the output is 0.9492
This result really confuse me, except the dimensions, the numbers inside the tensors are totally the same. Does anyone know the reason why the difference is?
the reason why i am testing on the method is i am working on project with Transformers.BartForConditionalGeneration. the loss result is given in the output, which is always in (1,) shape. the output is confusing. If my batch size is greater than 1, i am supposed to get batch size number of loss instead of just one. I took a look at the code, it just simply use nn.CrossEntropyLoss(), so i am considering that the issue may be in the nn.CrossEntropyLoss() method. However, it is stucked in the method.
答案1
得分: 0
在第二种情况下,您添加了一个额外的维度,这意味着最终,对Logits张量(input
)上的Softmax不会在不同的维度上应用。
这里我们分别计算这两个量:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=False)
>>> target = torch.randn(3, 5).softmax(dim=1)
首先,您有loss(input, target)
,这与以下相同:
>>> o = -target*F.log_softmax(input, 1)
>>> o.sum(1).mean()
而在第二种情况下,loss(input[None], target[None])
,与以下相同:
>>> o = -target[None]*F.log_softmax(input[None], 1)
>>> o.sum(1).mean()
英文:
In the second case, you are adding an extra dimension which means that ultimately, the softmax on the logits tensor (input
) won't be applied on a different dimension.
Here we compute the two quantities separately:
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=False)
>>> target = torch.randn(3, 5).softmax(dim=1)
First you have loss(input, target)
which is identical to:
>>> o = -target*F.log_softmax(input, 1)
>>> o.sum(1).mean()
And your second scenario, loss(input[None], target[None])
, identical to:
>>> o = -target[None]*F.log_softmax(input[None], 1)
>>> o.sum(1).mean()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论