2023年2月16日 18:40:15go评论53阅读模式

英文:

torch crossentropy loss calculation difference between 2D input and 3D input

问题

我在torch.nn.CrossEntropyLoss上进行测试。我正在使用官方页面上显示的示例。

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=False)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)

输出是2.05。
在示例中，输入和目标都是2D张量。由于在大多数自然语言处理情况下，输入应该是3D张量，相应地输出也应该是3D张量。因此，我编写了一些测试代码，并发现了一个奇怪的问题。

input = torch.stack([input])
target = torch.stack([target])
output = loss(input, target)

输出是0.9492。
这个结果真的让我困惑，除了维度外，张量内部的数字完全相同。有人知道差异的原因吗？

我测试这个方法的原因是我正在使用Transformers.BartForConditionalGeneration进行项目开发。损失结果显示在输出中，始终是(1,)形状。这个输出令人困惑。如果我的批量大小大于1，我应该得到批量大小数量的损失，而不仅仅是一个。我查看了代码，它只是简单地使用了nn.CrossEntropyLoss()，所以我认为问题可能出在nn.CrossEntropyLoss()方法中。但是，我陷入了这个方法中。

英文:

i am running a test on torch.nn.CrossEntropyLoss. I am using the example shown on the official page.

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=False)
target = torch.randn(3, 5).softmax(dim=1)
output = loss(input, target)

the output is 2.05.
in the example, both the input and the target are 2D tensors. Since in most NLP case, the input should be 3D tensor and correspondingly the output should be 3D tensor as well. Therefore, i wrote the a couple lines of testing code, and found a weird issue.

input = torch.stack([input])
target = torch.stack([target])
output = loss(ins, ts)

the output is 0.9492
This result really confuse me, except the dimensions, the numbers inside the tensors are totally the same. Does anyone know the reason why the difference is?

the reason why i am testing on the method is i am working on project with Transformers.BartForConditionalGeneration. the loss result is given in the output, which is always in (1,) shape. the output is confusing. If my batch size is greater than 1, i am supposed to get batch size number of loss instead of just one. I took a look at the code, it just simply use nn.CrossEntropyLoss(), so i am considering that the issue may be in the nn.CrossEntropyLoss() method. However, it is stucked in the method.

答案1

得分: 0

在第二种情况下，您添加了一个额外的维度，这意味着最终，对Logits张量（input）上的Softmax不会在不同的维度上应用。

这里我们分别计算这两个量：

>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=False)
>>> target = torch.randn(3, 5).softmax(dim=1)

首先，您有loss(input, target)，这与以下相同：

>>> o = -target*F.log_softmax(input, 1)
>>> o.sum(1).mean()

而在第二种情况下，loss(input[None], target[None])，与以下相同：

>>> o = -target[None]*F.log_softmax(input[None], 1)
>>> o.sum(1).mean()

英文:

In the second case, you are adding an extra dimension which means that ultimately, the softmax on the logits tensor (input) won't be applied on a different dimension.

Here we compute the two quantities separately:

&gt;&gt;&gt; loss = nn.CrossEntropyLoss()
&gt;&gt;&gt; input = torch.randn(3, 5, requires_grad=False)
&gt;&gt;&gt; target = torch.randn(3, 5).softmax(dim=1)

First you have loss(input, target) which is identical to:

&gt;&gt;&gt; o = -target*F.log_softmax(input, 1)
&gt;&gt;&gt; o.sum(1).mean()

And your second scenario, loss(input[None], target[None]), identical to:

&gt;&gt;&gt; o = -target[None]*F.log_softmax(input[None], 1)
&gt;&gt;&gt; o.sum(1).mean()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

torch crossentropy loss计算在2D输入和3D输入之间的差异

问题

答案1

AttributeError: ‘MpoImageFile’对象没有属性’shape’。

CPU在使用PyTorch Lightning训练模型时内存不足。

如何在应用了阶跃函数之后获取张量的前k个值？

CUDA内存不足错误：尽管有可用的GPU内存，但CUDA内存不足。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论