`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

huangapple go评论74阅读模式
英文:

Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

问题

我完全是PyTorch的新手。我正在参加一个电子课程,并尝试使用PyTorch进行实验。所以我遇到了两个损失函数(使用这两个损失函数的假设是对数稳定性):

  • nn.BCEWithLogitsLoss()

  • nn.BCELoss()

对于代码和这两个损失函数的适当调整,我得到了完全不同的准确度曲线!例如,使用 nn.BCELoss() 的代码片段如下:

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

准确度图如下:

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

而对于 nn.BCEWithLogitsLoss(),如下:

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

准确度图如下:

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

两个示例的其余代码都相同。(请注意,损失曲线相似且合适)

这两个代码段的学习曲线大致如下:

`nn.BCEWithLogitsLoss()`和`nn.BCELoss()`在行为上完全不同。

我无法弄清楚是什么导致了这个问题(如果我的代码中有错误或者我的PyTorch出了问题)。在此提前感谢您的时间和帮助。

英文:

i'm totally new to pytorch. I was taking an e-course and was experimenting with pytorch. So i came across the two loss functions(The hypothesis for using these two losses is numerical stability with logits):

> nn.BCEWithLogitsLoss()

and
> nn.BCELoss()

For appropriate adjustments to the code and these two loss functions, I had quite different accuracy curves!
For example with nn.BCELoss() as the below code snippet:

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

Accuracy plot was:
enter image description here

And for nn.BCEWithLogitsLoss(), as below:

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

Accuracy plot was:enter image description here

The rest of the code is the same for both examples. (Note that, loss curves were similar and decent)
The leaning curves for both snippets were something like this:
enter image description here
I couldn't figure out, what is causing this problem(if there is a bug in my code or something wrong with my pytorch.
Thank you for your time, and help in advance.

答案1

得分: 1

nn.BCELoss() 期望你的输出是概率,也就是经过 Sigmoid 激活的。

nn.BCEWithLogitsLoss() 期望你的输出是 logits,也就是没有经过 Sigmoid 激活的。

我觉得可能你在计算某些东西时出了错(比如准确率)。下面是基于你的代码给你一个简单的例子:

使用概率:

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.584
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.65
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.995
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.993

现在使用 logits:

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.019
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.988
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.988
英文:

nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.

I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:

With probabilities:

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333

Now with logits

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049

答案2

得分: 0

你需要根据你正在使用的损失函数(即标准)修改代码。

对于BCEloss - 由于你的模型中使用了 Sigmoid 层,因此输出值在 0 和 1 之间。

对于BCEWithLogitsLoss - 输出是对数几率(Logit)。对数几率可以是负数或正数。对数几率是z,其中

z = w1x1 + w2x2 + ... wn*xn

因此,在使用BCEWithLogitsLoss 时,需要将此输出通过一个 Sigmoid 层(你可以创建一个小函数来返回1/(1+np.exp(-np.dot(x,w))))传递,然后计算准确性。

希望这有所帮助!

英文:

You would need to modify the code according to the loss function (aka criterion) you are using.
For BCEloss - Since you are using the sigmoid layer in your model: so the output are between 0 and 1.

For BCEWithLogitsLoss - Output is the logit. Logit can be negative or positive. Logit is z, where

z = w1*x1 + w2*x2 + ... wn*xn 

So, for your predictions while using BCEWithLogitsLoss, you need to pass this output through a sigmoid layer (For this you can create a small function which returns

1/(1+np.exp(-np.dot(x,w)))

and then you should calculate the accuracy.

Hope this helps!!!

huangapple
  • 本文由 发表于 2023年4月11日 02:18:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979632.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定