2023年4月11日 02:18:38go评论74阅读模式

英文:

Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

问题

我完全是PyTorch的新手。我正在参加一个电子课程，并尝试使用PyTorch进行实验。所以我遇到了两个损失函数（使用这两个损失函数的假设是对数稳定性）：

nn.BCEWithLogitsLoss()

和

nn.BCELoss()

对于代码和这两个损失函数的适当调整，我得到了完全不同的准确度曲线！例如，使用 nn.BCELoss() 的代码片段如下：

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

准确度图如下：

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

而对于 nn.BCEWithLogitsLoss()，如下：

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

准确度图如下：

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

两个示例的其余代码都相同。（请注意，损失曲线相似且合适）

这两个代码段的学习曲线大致如下：

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

我无法弄清楚是什么导致了这个问题（如果我的代码中有错误或者我的PyTorch出了问题）。在此提前感谢您的时间和帮助。

英文:

i'm totally new to pytorch. I was taking an e-course and was experimenting with pytorch. So i came across the two loss functions(The hypothesis for using these two losses is numerical stability with logits):

> nn.BCEWithLogitsLoss()

and
> nn.BCELoss()

For appropriate adjustments to the code and these two loss functions, I had quite different accuracy curves!
For example with nn.BCELoss() as the below code snippet:

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

Accuracy plot was:
enter image description here

And for nn.BCEWithLogitsLoss(), as below:

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

Accuracy plot was:enter image description here

The rest of the code is the same for both examples. (Note that, loss curves were similar and decent)
The leaning curves for both snippets were something like this:
enter image description here
I couldn't figure out, what is causing this problem(if there is a bug in my code or something wrong with my pytorch.
Thank you for your time, and help in advance.

答案1

得分: 1

nn.BCELoss() 期望你的输出是概率，也就是经过 Sigmoid 激活的。

nn.BCEWithLogitsLoss() 期望你的输出是 logits，也就是没有经过 Sigmoid 激活的。

我觉得可能你在计算某些东西时出了错（比如准确率）。下面是基于你的代码给你一个简单的例子：

使用概率：

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.584
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.65
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.995
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.993

现在使用 logits：

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.019
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.988
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.988

英文:

nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.

I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:

With probabilities:

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x &gt; 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f&quot;Epoch: {e}, Loss: {loss:.4f}&quot;)
        print(f&quot;Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}&quot;)

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333

Now with logits

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f&quot;Epoch: {e}, Loss: {loss:.4f}&quot;)
        print(f&quot;Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}&quot;)

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049

答案2

得分: 0

你需要根据你正在使用的损失函数（即标准）修改代码。

对于BCEloss - 由于你的模型中使用了 Sigmoid 层，因此输出值在 0 和 1 之间。

对于BCEWithLogitsLoss - 输出是对数几率(Logit)。对数几率可以是负数或正数。对数几率是z，其中

z = w1x1 + w2x2 + ... wn*xn

因此，在使用BCEWithLogitsLoss 时，需要将此输出通过一个 Sigmoid 层（你可以创建一个小函数来返回1/(1+np.exp(-np.dot(x,w)))）传递，然后计算准确性。

希望这有所帮助！

英文:

You would need to modify the code according to the loss function (aka criterion) you are using.
For BCEloss - Since you are using the sigmoid layer in your model: so the output are between 0 and 1.

For BCEWithLogitsLoss - Output is the logit. Logit can be negative or positive. Logit is z, where

z = w1*x1 + w2*x2 + ... wn*xn

So, for your predictions while using BCEWithLogitsLoss, you need to pass this output through a sigmoid layer (For this you can create a small function which returns

1/(1+np.exp(-np.dot(x,w)))

and then you should calculate the accuracy.

Hope this helps!!!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

问题

答案1

答案2

Scrapy在shell中运行正常但不输出结果。

Finding mean/SD of a group of population and mean/SD of remaining population within a data frame

如何使用代码区分普通用户和机器人（discord.py）？

Python导入错误：自定义C模块的未定义符号

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论